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ABSTRACT 


This  research  investigates  the  problems  inherent  in  Decision  Support  Systems 
(DSS)  that  depend  on  the  quality  and  accuracy  of  legacy  information  as  the  basis  for 
decision-making.  A  Spatial  Decision  Support  System  (SDSS)  was  developed  at  Naval 
Postgraduate  School  to  analyze  the  comparative  desirability  of  Army  Reserve  Unit 
locations.  The  Army  Reserve  Installation  Evaluation  System  (ARIES)  integrates  a  GIS 
mapping  engine  and  a  decision  model  solver  in  a  flexible  environment  that  leverages 
operational  legacy  database  information  for  decision-making. 

Data  quality  problems  from  legacy  sources  motivated  the  development  of  a  data 
migration  plan  to  transform  the  source  data  into  an  architecture  optimized  for  the  ARIES 
SDSS  application.  This  research  developed  a  prototype  Data  Migration  Tool  (DMT)  to 
extract  the  relevant  source  data  into  a  centralized  repository  for  the  SDSS  with  an 
acceptable  degree  of  data  quality  to  support  SDSS  outcomes.  Six  data  quality  attributes 
were  identified:  accuracy,  completeness,  consistency,  timeliness,  uniqueness,  and 
validity.  The  ARIES  DMT  focused  on  data  validity  and  developed  techniques  for 
measuring  and  enforcing  data  validity.  The  DMT  also  specified  individual 
responsibilities  for  data  administration,  development  of  data  retrieval  routines,  and  data 
quality  assessment. 

Significant  system  performance  enhancements  resulted  from  implementation  of 
the  DMT  by  leveraging  the  spatial  aspects  of  the  underlying  repository  through 
geographic  queries  that  efficiently  localized  subsets  of  the  data  files.  Additional 
performance  enhancements  were  obtained  through  the  use  of  data  warehousing 
techniques. 
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I.  INTRODUCTION 


A.  GENERAL 

This  research  analyzes  the  problems  inherent  in  Decision  Support  Systems  (DSS) 
that  rely  upon  legacy  databases  as  the  primary  data  source.  The  quality  and  accuracy  of 
an  outcome  that  any  DSS  returns  cannot  be  better  than  the  quality  and  accuracy  of  the 
underlying  database  information.  We  explore  this  premise  in  the  context  of  a  prototype 
Spatial  Decision  Support  System  (SDSS)  developed  for  the  United  State  Army  Reserve 
Command  (US ARC)  that  allows  analysis  of  the  comparative  desirability  of  Army 
Reserve  Unit  locations.  Since  many  DSS’s  are  model-based,  initial  development  often 
focuses  on  specification  of  the  underlying  model(s)  and  associated  user  interfaces.  Issues 
concerning  the  data  required  to  run  the  models  are  frequently  left  until  the  latter  stages  of 
development. 

Initial  implementation  of  the  US  ARC  SDSS  took  exactly  this  approach  and,  as  a 
result,  encountered  serious  problems  with  the  underlying  data  that  compromised  the 
quality  of  the  decisions  that  the  SDSS  was  able  to  render.  Significant  measures  were 
required  to  resolve  these  problems;  specifically  an  entire  data  administration  module  had 
to  be  developed  to  identify  meta-data  and  regulate  the  extraction  of  DSS  data  from  source 
data  files.  This  module  required  the  adoption  of  procedures  to  assess  and  monitor  the 
quality  of  the  data  as  it  passed  from  the  source  legacy  databases  to  the  databases  used  as 
input  to  the  SDSS.  The  spatial  nature  of  much  of  the  data  put  a  special  twist  into  this 
process  since  the  SDSS  application  takes  advantage  of  these  spatial  aspects  to  streamline 
system  performance.  This  research  explores  the  confluence  of  data  quality,  decision 
support,  legacy  data,  and  spatial  data,  and  prescribes  procedures  for  dealing  with  data 
quality  in  SDSS  development.  A  major  lesson  learned  was  data  quality  must  be 
addressed  at  the  beginning  of  (S)DSS  projects  and  not  left  until  the  end  of  the 
development  cycle. 
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B.  BACKGROUND 


The  Force  Support  Package  (FSP)  Readiness  Office,  a  component  of  the  U.S. 
Army  Reserve  Command  (USARC),  is  tasked  with  assessing  and  improving  the  readiness 
of  priority  Troop  Program  Units  (TPU).  A  TPU  is  the  foundation  of  the  Army  Reserve 
force,  ranging  from  50  to  250,  typically  consisting  of  about  150  reservists.  The  TPUs 
that  are  in  the  FSP,  which  contain  the  units  designated  for  rapid  deployment,  are  of  most 
concern  to  the  Readiness  Office. 

Readiness,  in  this  context,  refers  primarily  to  personnel  readiness,  i.e.,  the  ability 
to  maintain  troops  that  are  properly  trained  and  qualified  individuals  in  a  sufficient 
number.  Many  of  the  numerous  factors  that  affect  readiness  are  dependent  on  the 
location  of  the  unit.  Relocation  of  a  unit  to  another  facility  can,  at  times,  be  the  best 
solution  when  a  unit  is  struggling  to  maintain  personnel  readiness.  During  today’s 
environment  of  force  reductions  and  realignments,  relocation  may  also  be  necessary  to 
support  force  consolidation  or  restructuring  efforts. 

Previously  these  decisions  were  based  upon  a  combination  of  personal  expertise 
and  narrowly  focused  studies.  This  ad  hoc  process  produced  results  that  often  proved 
difficult  to  communicate,  defend,  and  build  consensus  around.  The  human  decision¬ 
maker  becomes  overloaded  quickly  by  the  large  number  of  factors  involved  in  the  TPU 
relocation  decision  without  the  aid  of  an  automated  decision  tool.  The  inadequacies  of 
the  current  approach  to  provide  any  detailed  solutions  to  such  a  complicated  problem 
inspired  the  search  for  a  convenient  and  systematic  automated  tool  that  could  be  based 
upon  a  decision  model. 

The  use  of  computer  based  DSS’s  to  aid  the  decision-maker  in  making  thorough 
and  informed  decisions  will  become  more  prevalent  as  the  use  of  distributed  working 
environments  increase.  Distributed  environments  allow  access  to  more  information  that 
will  increase  the  overall  effectiveness  of  decision  support  tools.  Experience  with  the 
USARC  SDSS  indicates  that  significant  attention  must  be  paid  to  the  quality  of  data 
underlying  decision  systems  in  order  to  ensure  the  quality  of  the  resulting  decisions. 
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c. 


THESIS  OBJECTIVES 


The  primary  objective  of  this  thesis  is  to  identify  problems  with  developing  an 
SDSS  based  upon  legacy  databases  with  a  high  variance  in  data  quality.  A  secondary 
objective  is  to  develop  an  application  design  process  that,  by  incorporating  data 
warehousing  techniques,  can  counteract  the  effects  of  poor  data  quality  on  the  resulting 
application.  This  involves  analyzing  the  development  process  used  for  the  current 
prototype,  identifying  the  relevant  data  quality  factors,  reviewing  data  warehousing 
techniques,  applying  those  techniques  to  address  data  quality  problems  in  the  prototype 
application,  and  examining  lessons  learned  from  the  prototype  development  process.  The 
research  questions  that  will  be  addressed  are: 

•  What  inherent  problems  are  involved  in  the  use  of  legacy  database 
information  in  the  development  of  a  state  of  the  art  DSS? 

•  What  are  the  relevant  data  quality  factors  for  site  location  decision  problems? 

•  What  data  warehousing  techniques  are  relevant  to  the  SDSS  design  process? 

•  What  steps  should  be  taken  during  the  design  and  development  process  to 
ensure  that  the  data  quality  will  support  a  level  of  confidence  required  by  the 
user  in  the  outcome  decision? 

•  Who  should  be  responsible  for  the  level  of  data  quality  involved  in  the 
development  of  an  SDSS? 

•  What  are  the  unique  problems  that  spatially  enabled  data  present  to  the  level 
of  data  quality? 

D.  SCOPE 

This  study  will  focus  on  the  SDSS  developed  for  USARC  to  support  the  unit 
location  decision  problem  and  unit  readiness  mission  responsibilities.  The  automated 
decision  tool  supports  the  process  of  relocating  units  that  are  not  meeting  readiness  goals 
to  sites  that  afford  them  better  opportunities  to  succeed. 

The  USARC  prototype,  the  Army  Reserve  Installation  Evaluation  System 
(ARIES),  has  a  number  of  external  restrictions  imposed  that  limit  the  true  effectiveness  of 
the  system.  For  example,  only  those  facilities  currently  owned  by  the  Army  Reserve  are 
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considered  as  potential  relocation  sites  (approximately  1,500  nationwide).  The 
discussion  of  data  quality  will  be  in  reference  to  the  data  that  supports  the  decision  factors 
of  these  facilities.  For  further  details  of  the  ARIES  project  refer  to  references  1  and  2. 

The  original  project  requirements  intended  to  avoid  any  extensions  to  existing 
data  maintenance  responsibilities.  US  ARC  also  specified  that  all  model  inputs  would  be 
drawn  from  existing  data  sources.  ARIES  provides  the  decision-maker  the  ability  to 
manually  input  data  needed  to  support  additional  decision  criteria  for  incorporation  in  the 
evaluation  process.  This  off-line  analysis  would  inject  even  more  concern  for  data 
quality  and  the  subsequent  confidence  level  of  subsequent  decisions  that  can  be  reached 
using  this  tool.  This  research  does  not  address  data  quality  issues  that  arise  from  this 
source  of  “ad  hoc”  data;  rather  it  focuses  on  the  data  quality  of  the  “feeder”  legacy 
databases  and  their  extracted  counterparts  in  the  SDSS. 

The  research  sponsor  did  not  define  adequately  the  ownership  associated  with  the 
data  that  was  used  to  support  the  majority  of  the  decision  criteria.  The  lack  of  a 
responsible  custodian  left  the  interpretation  of  many  of  the  data  fields  up  to  the  designers 
and  their  ability  to  ascertain  the  meaning  of  the  underlying  database  schema. 

E.  ORGANIZATION  OF  THE  STUDY 

The  balance  of  this  study  is  organized  as  described  below.  Chapter  II  discusses 
the  design  process  and  architecture  used  in  the  development  of  the  ARIES  prototype 
project.  Chapter  III  discusses  the  basic  characteristics  and  elements  involved  in  data 
warehouses,  data  marts  and  issues  with  data  quality.  Chapter  IV  details  how  these  data 
warehousing  techniques  were  implemented  in  the  ARIES  SDSS  application.  Also 
discussed  in  that  chapter  are  the  problems  with  the  availability  and  quality  of  the  database 
information  used  in  the  decision  process  that  surfaced  throughout  the  production  of  the 
prototype  user  interface.  Chapter  V  provides  a  number  of  post  application  design  issues 
and  recommendations  for  further  study  that  could  assist  future  SDSSs  of  this  type. 
Chapter  VI  presents  conclusions  and  the  contributions  of  this  study. 
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II.  SDSS  ARCHITECTURE  DEVELOPMENT 


A.  THE  ARIES  PROJECT 


This  chapter  describes  the  design  process  and  architecture  of  the  ARIES  SDSS 
project  developed  at  the  Naval  Postgraduate  School  (NPS)  for  the  Army  Reserve 
Command.  The  ARIES  Development  Process  is  depicted  below  in  Figure  1 .  At  the  heart 
of  the  ARIES  architecture  is  a  decision  model  that  was  developed  in  conjunction  with  a 
group  of  experts  at  the  FSP  office.  The  decision  model  produces  a  list  of  decision  criteria 
that  must  then  be  mapped  to  operational  source  databases.  This  mapping  process  was 
done  by  identifying  business  rules  for  each  criterion  in  the  form  of  queries.  These 
business  rules  became  the  basis  of  the  data  model  used  in  the  application.  Once  the 
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Figure  1.  ARIES  Architecture  Development  Process 
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decision  model  and  data  model  were  finalized,  development  of  the  system  and  user 
interface  began. 

B.  DECISION  MODEL 

The  core  of  any  DSS  is  one  or  more  decision  models.  The  ARIES  decision  model 
is  a  multi-criteria  model  represented  as  a  hierarchy  of  objectives  or  goals  with  associated 
measures  or  criteria  involved  in  making  a  specific  decision.  An  objective  is  referred  to  in 
most  decision  literature  as  a  desired  direction  and  a  goal  as  the  quantifiable  progress  in 
that  direction.  For  the  purposes  of  this  discussion,  we  will  adopt  the  terminology  of  goals 
and  measures  that  is  consistent  with  the  decision  software  package  used  in  the  ARIES 
project. 

To  begin  the  design  of  a  decision  model,  a  detailed  elicitation  process  is  required 
to  capture  the  characteristics  and  aspects  associated  with  the  specific  decision  problem 
being  modeled.  This  process  identifies  a  top-level  decision  goal  that  is  subsequently 
refined  to  layers  of  subordinate  goals.  The  subordinate  goals  and  their  associated 
decision  measures  must  be  arranged  in  a  hierarchy  that  allows  the  final  analysis  to  arrive 
at  an  evaluation  for  the  top-level  goal. 

1.  Decision  Process  Elicitation 

The  first  step  in  modeling  the  TPU  location  decision  problem  was  to  gather  a 
group  of  knowledgeable  experts  in  the  area  of  Army  Reserve  manpower  and  identify  the 
top-level,  or  overall  goal.  The  use  of  experts  rather  than  an  extensive  study  was  adopted 
in  the  interest  of  cost  savings  as  well  as  the  ability  to  develop  a  working  prototype 
decision  model  in  a  short  period  of  time.  The  elicitation  process  was  done  by  focusing  on 
factors  that  were  identified  in  prior  research.  Reference  1,  of  the  TPU  readiness  issue  as 
well  as  the  process  knowledge  of  the  experts. 

The  expert  panel,  consisting  of  USARC  personnel,  was  able  to  identify  an  overall 
goal  that  related  unit  location  to  unit  readiness.  This  was  a  challenging  process  because 
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of  the  difficulty  of  placing  a  measurement  on  readiness.  Eventually  the  expert  panel 
settled  on  an  overall  goal  of  site  desirability.  The  panel  decomposed  site  desirability  into 
two  subgoals,  personnel  readiness  support  (the  ability  to  maintain  the  desired  number  of 
qualified  reservists  at  the  proposed  site)  and  facility  quality  (a  general  assessment  of  the 
costs  and  benefits  of  a  location  that  are  only  loosely  related  to  readiness). 

This  approach  to  determining  decision  goals  was  initially  done  without  any 
concern  for  the  availability  of  data  that  would  subsequently  support  the  model.  The 
elicitation  process  also  did  not  involve  any  formal  review  of  the  current  process  for 
making  this  decision.  Rather,  it  was  an  effort  to  determine  the  ideal  decision  process  for 
TPU  readiness  in  the  context  of  unit  location.  A  review  of  the  existing  decision  process 
would  have  identified  information  currently  used  to  make  an  informed  decision  which 
could  later  become  the  foundation  for  building  a  data  model.  The  lack  of  such  a  detailed 
data  model  proved  to  be  an  obstacle  to  the  project. 

2.  Decision  Goals  and  Goal  Hierarchy 

The  overall  decision  goal  of  Site  Desirability  was  broken  down  into  two  subgoals. 
Facility  Quality  and  Personnel  Readiness,  which  were  in  turn  refined  further  into  either 
additional  subgoals  or  decision  measures.  Decision  measures  are  the  basic  elements  of 
the  model  to  which  a  single  objective  value  can  be  assigned.  Each  subgoal  must  be 
ultimately  broken  down  into  these  basic  elements  to  allow  the  multi-criteria  decision 
making  to  occur.  Table  1  shows  the  breakdown  of  the  facility  quality  and  personnel 
readiness  decision  goals  into  decision  measures  for  the  ideal  decision  model.  Subsequent 
discussions  revealed  that  that  data  did  not  exist  for  some  of  the  measures  so  they  were 
dropped  from  the  initial  model.  Table  2  shows  the  decision  measures  could  to  be 
implemented  with  available  data. 

The  Facility  Quality  subgoal  is  used  to  describe  specific  attributes  of  a  proposed 
facility  (i.e.,  the  building  and  the  real  estate).  These  values  are  primarily  extracted  from 
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SITE  DESIRABILITY 


I.  Facility  Quality 

%  Administrative  Space  FT 
%  Administrative  Space  PT 
%  Motorpool  Space 
Distance  to  Headquarters 
Facility  Age 

Facility  Maintenance  Backlog 
Facility  Condition 
Facility  Operating  Costs 
Facility  Leased/Owned 

II.  Personnel  Readiness 

MOS  Qualification 

Available  Prior  Service 

Available  MOS from  Closing  Units 
Available  MOS from  the  IRR 

Fill  Level 

Market  Supportability 
Market  Quality 

Civilian  Labor  Market 
Closing  Unit  Transfers 
IRR  Availability 
Recruit  Market  Size 

Area  Units 

Area  Drill  Attendance 
Area  Loss  Rate 
Area  Transfer  Rate 
Average  Area  Manning 
Distance  to  Recruiter 
Reassignments 
Competition 
Training  Support 

Equipment  Readiness 

%  Storage  On-Site 
Distance  to  AMSA 
Distance  to  ECS 

Training  Facility 

%  Facility  Usage 
Distance  to  Special  Training 
Distance  to  WET  Site 
Distance  to  Weapons  Range 
Facility  Weekend  Use 

Table  1.  Complete  Hierarchy  of  Goals  for  Site  Desirability 
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SITE  DESIRABILITY 


I.  Facility  Quality 

Facility  Age 

Facility  Maintenance  Backlog 
Facility  Condition 
Facility  Operating  Costs 
Facility  Leased/Owned 

II.  Personnel  Readiness 

MOS  Qualification 

Available  Prior  Service 

Available  MOS  from  Closing  Units 
Available  MOS  from  the  IRR 

Fill  Level 

Market  Supportability 
Market  Quality 

Closing  Unit  Transfers 
IRR  Availability 
Recruit  Market  Size 
Area  Units 

Area  Drill  Attendance 
Area  Loss  Rate 
Area  Transfer  Rate 
Average  Area  Manning 
Distance  to  Recruiter 
Reassignments 
Competition 
Training  Support 

Equipment  Readiness 

Distance  to  AMSA 
Distance  to  ECS 

Training  Facility 

Facility  Weekend  Use 

Table  2.  Goal  Hierarchy  (showing  only  those  measures  with  automated  inputs) 


databases  maintained  by  the  Army’s  Corps  of  Engineers  and  describe  the  age,  condition, 
capacity,  and  costs  associated  with  the  major  structures  of  the  site. 
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The  Personnel  Readiness  subgoal  is  used  to  determine  the  ability  of  the  area  to 
support  personnel  readiness.  Personnel  readiness  was  broken  down  into  two  subgoals, 
Fill  Level  and  Military  Occupational  Specialties  (MOS)  Qualification  Level.  Fill  Level 
indicates  the  ability  of  the  area  surrounding  a  site  to  support  a  sufficient  number  of 
reservists  whereas  MOS  Qualification  Level  indicates  the  availability  of  the  skill  set 
required  by  the  moving  unit  in  the  area  of  the  proposed  site.  Each  of  these  goals  is 
further  broken  down  generating  a  hierarchy  of  the  goals  and  measures  that  make  up  the 
actual  decision  model. 

The  resulting  hierarchy  of  goals  represents  the  location-related  factors  that  were 
determined  by  consensus  of  the  expert  panel  to  be  important  in  the  TPU  relocation 
decision.  This  goal  hierarchy,  shown  in  Tables  1  and  2,  is  used  by  the  multi-criteria 
decision  solver  to  obtain  a  final  evaluation  of  the  desirability  of  each  site. 

3.  Decision  Measures 

A  decision  measure  is  the  result  decomposing  of  each  goal  in  the  hierarchy  into 
objective  inputs  that  can  be  qualified  and  assessed.  These  objective  inputs  can  come 
from  various  sources  such  as  databases,  spreadsheets,  data  analysis,  etc.  The  hierarchy 
developed  by  the  expert  panel  allowed  most  of  the  inputs  to  come  from  existing  database 
information,  minimizing  the  involvement  of  the  user. 

The  decision  analysis  software  integrates  all  the  dissimilar  dimensions  of  the 
measures  by  obtaining  a  common  unit  value  for  each  decision  measure.  The  common  unit 
value  is  arrived  at  through  the  use  of  yield  curves  for  the  decision  measures.  A  relative 
weight  is  also  applied  to  each  goal  to  denote  the  level  of  importance  of  that  goal 
compared  to  other  goals.  As  a  result,  certain  nodes  in  the  hierarchy  can  be  calibrated  to 
affect  the  outcome  more  strongly  than  others  by  assigning  them  higher  weights.  These 
values  are  then  summed  for  each  goal  to  determine  the  overall  desirability. 
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Appendix  A  gives  a  detailed  summary  of  each  measure  including  the  definition, 
source  information  used,  resulting  queries,  and  associated  yield  curve.  These  decision 
measures  became  the  foundation  that  for  the  application’s  data. 

C.  DATA  MODEL 

“Good  decision  support  requires  an  integrated,  stable,  well-managed  data 
resource.  ”  [Ref.  3:p.  267] 

A  DSS  requires  data  sources  from  which  to  draw  information  that  will  fully 
support  the  underlying  decision  model.  For  the  ARIES  decision  hierarchy  to  provide 
acceptable  confidence  levels  for  the  resulting  decisions,  it  must  be  based  upon  objective 
historical  data.  US  ARC  stipulated  that  the  ARIES  application  should  minimize  any 
associated  data  management.  Specifically  they  required  that  ARIES  give  rise  to  no  new 
data  administration  responsibilities.  Further,  they  specified  that  all  database  information 
used  must  be  available  or  easily  transferable  to  the  USARC  Local  Area  Network  (LAN), 
and  the  application  should  be  able  to  retrieve  information  from  those  available  data 
sources  without  regard  for  their  location.  These  basic  requirements  formed  the 
foundation  from  which  the  ARIES  data  model  was  developed. 

Using  the  goal  hierarchy  and  the  resulting  decision  measures,  steps  were  taken  to 
identify  “business  rules”  for  each  decision  measure  that  could  be  translated  into  objective 
equations.  These  business  rules  were  also  derived  by  consensus  of  the  expert  panel.  The 
business  rules  allow  source  data  elements  to  be  identified  that  provide  an  objective 
assessment  of  a  measure  and  therefore  automate  the  site  evaluation  process.  Once  the 
required  data  elements  are  identified,  the  set  of  data  files  required  to  support  the 
application  are  also  identified. 

1.  Developing  the  Business  Rules 

Using  the  ideal  decision  hierarchy  and  resulting  decision  measures,  the  expert 
panel  documented  the  factors  or  elements  comprising  each  measure  (i.e.,  Average  Area 


11 


Manning  =  Number  of  Personnel  Assigned  /  Number  of  Personnel  Required).  In 
addition  to  developing  an  objective  rule  for  each  measure,  it  was  necessary  for  the  expert 
panel  to  define  each  element  that  made  up  this  business  rule.  These  definitions  are  used 
to  determine  the  actual  data  that  are  required  to  support  each  decision  measure.  A 
complete  description  of  each  measure  and  the  associated  business  rule  is  contained  in 
Appendix  A. 

2.  Mapping  the  Business  Rules  to  Real  Data 

Each  equation  definition  identified  required  individual  data  elements.  These  data 
elements  were  then  mapped  to  operational  data  elements  available  in  database  files  on  the 
US  ARC  LAN.  It  was  determined  that  ten  of  the  decision  measures  in  the  ideal  decision 
hierarchy  did  not  have  readily  available  data  that  could  support  the  model.  These 
measures  were  not  automated  in  the  final  prototype  application. 

Logic  diagrams  were  drafted  for  each  measure  using  the  business  rule  and  the 
identifying  source  data  files.  This  information  was  gathered  through  discussions  with 
individuals  at  USARC  headquarters  familiar  with  the  requisite  database  information. 
Who  were  able  to  identify  specific  files  that  would  contain  the  required  data  elements  for 
each  measure.  Some  information  such  as  census  information  and  facility  information 
were  determined  to  be  available  through  other  sources  such  as  the  Corps  of  Engineers. 

Because  of  the  geographic  nature  of  some  of  the  decision  measures  (e.g.,  Area 
Loss  Rate),  it  became  apparent  that  a  spatial  dimension  would  be  necessary  in  the  final 
application.  This  requirement  for  area-specific  data  led  to  the  integration  of  a  Geographic 
Information  System  (GIS)  to  aid  in  the  selection,  querying,  and  visualization  of  this 
decision  problem. 

3.  Identify  Source  Data 

The  source  data  identified  include  several  types  of  database  files,  transactional 
data,  spatial  data,  personnel  data,  historical  data,  and  analysis  data.  This  wide  spectrum  of 
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dissimilar  database  types  posed  a  challenge  in  refining  the  logic  developed  for  each 
measure.  In  most  cases  these  source  databases  were  being  used  and  maintained  by 
different  entities  within  the  USARC  headquarters  facility  for  their  own  use.  The  initial 
data  sets  that  were  collected  for  the  prototype  were  extracts  of  these  databases  for  the 
state  of  Pennsylvania.  As  development  progressed  the  full  national  databases  were 
collected  and  integrated  into  the  project.  A  description  of  all  the  source  databases  is 
contained  in  Appendix  B  with  the  meta-data  information  available  for  each  file.  The 
initial  development  plan  was  to  draw  information  directly  from  each  source  file  during 
each  individual  site  evaluation  session.  This  process  was  found  to  be  inefficient  as 
discussed  later  in  the  chapter. 

D.  SYSTEM  ARCHITECTURE 

With  the  Decision  Model  complete  and  the  Data  Model  specified,  an  automated 
decision  application  could  then  be  developed  to  integrate  the  decision  model’s 
information  needs  with  the  knowledge  of  available  information  in  the  data  model.  The 
Graphical  User  Interface  (GUI)  accepts  the  required  inputs  for  the  problem  from  the  user 
and  conducts  the  evaluation  of  the  defined  scenario.  This  basic  architecture  is  depicted  in 
Figure  2. 

It  became  clear  early  in  the  project  that  budget  and  time  limitations  would  not 
allow  the  development  of  an  application  that  would  carry  out  all  functions  of  this  project 
independently.  This  led  to  the  integration  of  several  commercial  of-the-shelf  (COTS) 
products  to  conduct  the  decision  analysis  and  assist  in  the  GIS  portion  of  the  project.  The 
ARIES  application  architecture  consists  of  four  components:  an  integrating  shell,  a 
mapping  engine,  a  decision  model  solver,  and  a  data  preprocessor. 
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Figure  2.  ARIES  System  Architecture 


USER 


1.  Integrating  Shell 


The  application  shell  that  integrates  and  operates  the  GUI  is  original  code  written 
in  Visual  Basic™.  Visual  Basic™  is  an  event-driven  programming  language  that  allowed 
the  integrating  shell  to  be  developed  to  integrate  into  the  infrastructure  already  in  place  at 
USARC.  Visual  Basic™  was  chosen  as  the  programming  language  because  the  USARC 
information  system  support  personnel  were  maintaining  other  applications  with  it  and 
already  had  a  basic  level  of  understanding.  This  would  allow  for  future  maintenance  and 
improvements  to  the  prototype  to  be  completed  in  house  by  USARC  personnel. 

Another  USARC  requirement  for  the  prototype  was  that  the  final  application 
should  relieve  the  user  of  the  burden  of  understanding  the  individual  COTS  applications 
and  protocols  involved  in  the  transfer  of  information.  Because  of  the  predictable  and 
structured  nature  of  this  decision  process  automation  of  most  of  the  tasks  was  very 
effective.  The  only  required  inputs  from  the  user  are  the  moving  unit  identification  code 
(UIC)  and  the  facility  identification  code  (FACID)  for  the  proposed  sites.  Figure  3  shows 
the  ARIES  User  Interface  screen  used  to  capture  the  input  parameters. 
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Figure  3.  ARIES  User  Interface  Screen  for  Specifying  Parameters 

The  GUI  will  accept  these  inputs  either  as  manual  inputs  or  from  the  map  display. 
The  overarching  shell  uses  a  set  of  predefined  tasks  based  on  the  decision  model  to 
acquire  the  database  information  for  each  decision  measure.  Some  of  these  tasks  are 
carried  out  through  an  Objected  Linking  and  Embedding  (OLE)  connection  with  the 
mapping  engine  and  others  are  carried  out  using  the  database  engine  in  Visual  Basic™. 

Once  the  shell  has  obtained  values  for  all  the  decision  measures  associated  with 
each  proposed  site,  this  information  matrix  is  passed  to  the  decision  solver.  The  decision 
solver  carries  out  its  evaluation  and  passes  control  back  to  the  GUI  where  the  user  has  the 
ability  to  print  reports,  conduct  dynamic  analysis,  or  consider  another  scenario. 


2.  Mapping  Engine 


Maplnfo™,  already  in  use  at  USARC,  was  chosen  as  the  mapping  engine  for 
several  reasons.  Maplnfo™  satisfied  all  the  known  and  anticipated  functional 
requirements,  it  was  already  owned  by  USARC,  had  proven  to  be  well  supported  and 
documented,  and  would  minimize  the  need  for  additional  training. 

Maplnfo™  is  a  commercial  mapping  package  that  is  used  as  a  graphical  input  tool 
and  provides  for  the  spatial  definition  and  processing  of  data.  It  converts  positions  to 
distances,  makes  proximity  determinations,  and  classifies  objects  by  geographical  region. 
The  integrating  shell  uses  the  OLE  connection  to  pass  data  to  and  from  Maplnfo™  and 
launch  a  MapBasic™  program  that  executes  the  spatial  queries.  The  ability  of 
Maplnfo™  to  localize  data  from  huge  databases  provided  a  significant  performance  gain 
when  the  spatial  queries  were  implemented. 

3.  Decision  Model  Solver 

A  decision  solver  was  required  that  would  conduct  multi-attribute  utility  analysis 
and  allow  for  “what-if  ’  dynamic  analysis  functionality.  Logical  Decision  for  Windows™ 
(LDW)  is  used  as  the  decision  solver  in  the  ARIES  application.  LDW™  was  chosen 
primarily  for  its  superior  implementation  of  the  underlying  decision  framework,  Multi- 
Attribute  Utility  Theory,  and  its  ability  to  provide  a  flexible  decision  analysis 
environment. 

LDW™  was  determined  to  be  superior  than  other  similar  products  in  terms  of 
overall  ease  of  use  for  the  novice  user.  LDW™  supports  for  a  wide  range  of  techniques 
to  obtain  user  preferences  (e.g.,  ordinal  criteria  ranking,  tradeoffs,  direct  graphical  and 
tabular  inputs).  The  application  also  allows  the  user  to  set  the  specific  information  about 
the  yield  curve  that  affects  each  decision  measure  to  include:  slope,  continuous  or 
discrete,  minimum  and  maximum  values,  and  shape.  Another  important  feature  that 
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LDW™,  particularly  for  the  ARIES  prototype,  is  the  ability  to  conduct  dynamic 
sensitivity  analysis  of  an  evaluation  session. 

Given  these  fundamental  strengths,  LDW™  did  have  limitations  in  its  ability  to 
communicate  with  the  other  applications.  The  ARIES  application  must  pass  control  to 
LDW™  when  the  decision  analysis  phase  begins  to  allow  LDW™  to  work.  The  16-bit 
architecture  of  LDW™  limited  the  available  control  methods  allowing  key-stroke 
passing  as  the  only  means  to  control  the  program  externally.  This  limitation  requires  that 
the  user  be  familiar  with  and  be  able  to  carry  out  some  functions  within  LDW™  in  order 
to  take  full  advantage  of  the  capabilities  of  the  decision  model  solver.  Through  the  use  of 
these  methods  of  passing  control  and  information  the  basic  evaluation  of  a  single  site 
location  problem  is  fully  automated  to  include  report  output. 

The  ARIES  shell  passes  the  subjective  values  for  each  decision  measure  to 
LDW™  for  evaluation  against  the  stored  default  preference  set  of  the  goal  hierarchy. 
This  is  done  through  a  text  file  because  of  limitations  in  LDW™’s  capacity  to  interact 
with  other  applications.  LDW™  receives  the  matrix  of  values  with  the  facility  name  and, 
using  the  stored  yield  curves  and  assigned  weighting,  evaluates  the  specific  scenario.  The 
user  can  either  print  the  standard  reports  or  carry  out  further  analysis  of  that  scenario 
using  the  LDW™  application. 

4.  Data  Preprocessor 

The  final  component  of  the  system  application,  the  data  preprocessor,  evolved 
from  the  need  to  have  the  operational  data  move  smoothly  into  the  ARIES  evaluation 
process.  The  data  preprocessor,  like  the  shell,  is  written  in  Visual  Basic™.  Even  if  all 
source  databases  were  consistent  and  accurate,  their  number  and  sizes  present 
considerable  performance  challenges  for  a  PC-based,  front-end  processor.  Because  of  the 
size  and  the  varying  location  of  the  data  files  involved  in  the  ARIES  data  model  an 
application  that  would  provide  an  administrative  function  for  the  source  information  was 
necessary. 
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E.  DATA  PREPROCCESSOR:  ARIES  ADMINISTRATOR 

The  data  preprocessor,  known  as  ARIES  Administrator,  is  the  transition  element 
that  moves  the  operational  data  from  its  source  form  to  a  centralized  data  resource  that  the 
ARIES  application  can  access.  US  ARC  s  initial  requirement  to  maintain  the  current 
location  of  each  source  data  file  was  the  primary  reason  for  developing  this  component. 
As  the  prototype  development  progressed,  it  became  clear  that  all  the  data  elements  had 
to  be  assembled  in  one  central  location  to  facilitate  an  acceptable  performance  level 
during  problem  evaluation.  This  additional  function  was  taken  on  by  the  Administrator 
which  evolved  into  an  extracting  agent.  For  the  Administrator  to  conduct  an  extraction  of 
the  source  data  files,  queries  had  to  be  generated  and  maintained  in  order  for  the  process 
to  be  duplicated  as  the  data  files  changed.  The  ARIES  shell  and  the  Administrator  are 
separate  applications  that  are  only  connected  by  the  requirements  of  the  ARIES  data 
resource  file  structure. 

1.  Maintaining  File  Locations 

In  order  for  the  Administrator  to  find  a  file  for  the  extraction  process  the  file 
name,  path,  and  type  must  be  maintained.  This  information  is  entered  in  the 
Administrator  under  the  File  Location  tab  by  using  the  standard  windows  file  location 
interface.  Figure  4  shows  the  information  maintained  for  each  database.  The 
Administrator  also  provides,  for  informational  purposes,  a  list  of  fields,  table  names,  and 
table  indices  that  must  be  present  to  support  the  processing  performed  by  the  ARIES 
shell.  In  addition,  the  Administrator  also  maintains  a  file  location  of  the  COTS 
applications,  Maplnfo™  and  LDW™,  to  allow  flexibility  in  the  installation  of  these 
supporting  applications. 

2.  Query  Development  Process 

Development  of  extract  queries  became  necessary  to  obtain  an  acceptable  level  of 
performance  for  the  ARIES  application.  The  initial  extract  queries  were  designed  to 
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Figure  4.  ARIES  Administrator  File  Location  Screen 

retrieve  only  the  required  fields  and  records  and  place  that  information  in  the  Microsoft 
Access™  format.  This  would  allow  the  integrating  shell  to  take  advantage  of  the 
database  engine  associated  with  Visual  Basic™.  Further  development  showed  the  need 
to  add  conditional  queries  that  would  filter  unwanted  and  obviously  bad  data.  Additional 
aggregate  queries  were  added  to  improve  the  performance  and  efficiency  of  the  ARIES 
application  queries  conducted  during  runtime. 


3.  Data  Extraction  Queries 

As  each  query  was  developed,  it  was  first  tested  in  a  stand-alone  mode  and  then 
implemented  into  the  data  extraction  process.  The  Administrator  Extract  Queries  Screen 
is  shown  in  Figure  5.  These  queries  are  stored  in  an  Access™  table,  named 
Administrator,  and  identified  by  the  table  name  it  generates  for  the  ARIES  data  resource 
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Figure  5.  ARIES  Administrator  Extract  Queries  Screen 


file.  These  queries  can  be  edited  by  the  administrator  to  accommodate  changes  in  a 
source  data  file  or  future  changes  in  the  application.  The  extract  table  structure  for  each 
query,  as  required  by  the  ARIES  application,  is  documented  in  the  Administrator  under 
the  Extract  Information  area  and  can  be  reviewed  by  the  administrator.  This  documents 
the  structure  of  the  table  required  by  the  ARIES  application  so  that  the  administrator  can 
adjust  the  queries  of  the  extract  without  affecting  the  workings  of  the  application. 

4.  Data  Cleaning,  Standardizing,  and  Extracting 

The  Administrator  became  a  mechanism  to  transform  the  original  data  into  a 
consistent  data  source  for  the  ARIES  application.  For  this  reason,  many  of  the  queries 
that  were  developed  retrieve  only  the  fields  and  record  data  required  by  the  business  rules 
for  each  measure.  It  also  became  necessary  to  standardize  the  naming  of  fields  that 
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referenced  the  same  data  element  because  different  data  files  used  different  naming 
conventions  (e.g.,  UIC,  UIC1,  CURRUIC).  This  standardization  allowed  the 
application  code  to  remain  consistent  without  concern  for  the  naming  conventions  used  in 
the  source  data  files  and  also  supported  the  functionality  desired  of  allowing  source  files 
to  change  without  having  to  change  the  associated  application  code.  One  final  task  that 
the  Administrator  incorporated  was  a  basic  cleansing  process.  Certain  values  that  were 
identified  during  initial  attempts  to  query  the  data  as  being  out  of  scope  or  null  were 
removed  during  the  extract.  This  was  accomplished  by  applying  additional  criteria  to  the 
extract  queries. 

F.  CHAPTER  SUMMARY 

The  overall  SDSS  architecture  of  a  decision  model,  a  data  model,  an  integrating 
application  system,  and  a  data  preprocessor  provides  simplified  access  to  a  set  of 
powerful  tools  for  decision  support.  These  four  components  generate  a  working 
prototype  application  that  is  able  to  complete  data  analysis  in  several  minutes  that  would 
otherwise  have  taken  several  groups  of  individuals  many  weeks.  The  decision  model  is  a 
mapping  of  the  desired  decision  process  into  a  hierarchy  of  goals  and  decision  measures 
that  will  allow  subjective  inputs  to  be  achieved  for  each  measure.  The  data  model  is 
generated  by  developing  business  rules  for  each  decision  measure  and  identifying  source 
data  to  answer  each  of  those  rules.  The  integrating  application  system  was  designed  to 
bring  together  the  decision  model  and  data  model  to  generate  an  analysis  of  a  given 
scenario.  The  final  component,  the  data  preprocessor,  is  the  transformation  agent  used  to 
prepare  and  condition  the  source  data  for  use  by  the  application. 

Although  it  was  not  a  focus  of  the  original  project,  this  system  development 
process  gave  rise  the  need  for  a  data  warehouse  component.  As  the  separate  components 
came  together  under  the  original  architecture,  the  system’s  ability  to  manipulate  data 
became  a  limiting  factor.  The  data  preprocessor  became  the  transformation  agent  that 
was  able  to  remove  the  required  data  elements  from  the  operational  source  files.  This 
preprocessing  organized  the  available  information  in  a  format  that  is  to  optimized  to 
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support  the  decision  process.  By  cleansing,  aggregating,  and  extracting  from  the  source 
data  files  the  data  preprocessor  generated  a  specialized  form  of  what  is  termed  a  data 
warehouse.  Chapter  III  discusses  terms  and  issues  surrounding  data  warehouses. 
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III.  DATA  WAREHOUSING  AND  DATA  QUALITY:  TOPICAL 


DISCUSSION 

One  of  the  major  consequences  of  the  ARIES  project  development  was  the 
realization  that  it  was  necessary  to  centralize  the  location  of  the  source  data  files.  This 
specific  user  requirement  was  not  identified  at  the  beginning  of  the  project,  but  rather 
evolved  during  the  development  process  as  a  need  to  improve  system  performance  during 
an  evaluation  session.  The  process  of  structuring  and  creating  this  centralized  data 
resource  resembles  some  of  the  current  database  strategies  being  used  by  organizations  to 
take  advantage  of  enterprise  wide  database  information.  This  chapter  provides  an 
introduction  to  data  warehousing,  data  marts  and  issues  involved  in  data  quality. 

A.  DATA  WAREHOUSING 

The  idea  of  gathering  and  integrating  all  the  operational  information  of  an 
organization  in  one  place  for  the  purpose  of  conducting  analysis  has  been  a  goal  of  many 
information  mangers.  Not  until  1990,  however,  when  W.H.  Inmon  coined  the  term  data 
warehousing  was  there  a  formalized  architecture  or  thought  process  for  developing  this 
strategic  management  tool.  Data  warehousing,  when  used  properly,  will  “provide  the 
decision  maker  of  an  organization  with  the  timely  information  necessary  to  effectively 
make  critical  business  decisions.”[Ref.  4:p.  3]  Since  1990  this  concept  has  continued  to 
flourish  and  grow  to  the  point  where  today  the  data  warehousing  industry  is  estimated  at 
$15  billion  annually,  and  95%  of  the  Fortune  1000  companies  have  built,  or  are  in  the 
process  of  building,  data  warehouses.  [Ref.  5:p.  1] 

1.  Definition 

The  term  data  warehouse  is  a  “catch  all”  phrase  that  has  taken  on  many  different 
meanings.  Michael  Brackett  defines  a  data  warehouse  as  “a  repository  of  consistent 
historical  data  that  can  be  easily  accessed  and  manipulated  for  decision  support.”  [Ref.  3: 
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p.  268-269]  Marc  Demarest  defines  a  data  warehouse  as  “a  consolidation  point  for 
enterprise  data  from  diverse  production  systems.”[Ref.  6:p.  1]  W.H.  Inmon  who  coined 
the  term  initially  defines  a  data  warehouse  as  “a  subject-oriented,  integrated,  time- 
variant,  and  nonvolatile  collection  of  data  in  support  of  management’s  decision-making 
process.”[Ref.  7:p.  2]  For  the  purposes  of  this  discussion  I  will  use  Inmon’s  definition 
and  correlate  the  ARIES  project  data  resource  file  with  this  definition. 

The  concept  of  “subject-oriented”  is  based  on  the  change  from  application- 
oriented  data  to  decision-support  data.  Because  decision  making  is  the  focus,  data  in  a 
data  warehouse  will  be  aligned  around  the  major  subject  areas  of  an  organization 
whereas  operational  data  will  be  oriented  towards  specific  business  processes  it  is 
supporting.  Operational  application-oriented  data  are  detailed  data  centered  on  functional 
requirements  while  data  for  data  warehouses  will  only  include  data  for  conducting 
decision  analysis.  [Ref.  7:p.  3-4]  The  ARIES  data  resource  meets  these  criteria  because  it 
contains  unit  readiness  subject  data  to  be  used  in  the  decision  analysis  of  site  desirability. 

Integration  as  a  critical  aspect  of  the  data  warehouse  is  an  important  step  that  does 
not  always  receive  appropriate  attention.  The  main  focus  of  the  integration  process  in 
data  warehousing  is  to  obtain  consistency  throughout  the  varying  legacy  databases  from 
which  the  data  are  extracted.  Consistency  can  be  achieved  in  many  different  ways  such 
as  standardized  naming  conventions,  measurement  of  values/encoding  structures,  and 
physical  characteristics  of  data.  Integration  assures  that  data  are  stored  in  a  single, 
globally  acceptable  manner  even  if  the  underlying  legacy  systems  do  not  do  so.  [Ref.  7:p. 
5-7]  Data  warehouse  integration  was  a  critical  objective  in  creating  the  ARIES  data 
resource.  Naming  conventions  were  standardized  (e.g.,  ZIPCODE,  ZIPC,  etc.  were 
changed  to  ZIP),  and  some  data  items  were  manipulated  to  ensure  the  consistency  of  the 
encoding  of  the  data  item  (e.g.,  Nine  digit  zip  codes  changed  to  five  digits). 

One  of  the  goals  of  decision  analysis  is  to  look  at  historical  data  in  order  to  say 
something  about  the  future.  This  leads  to  the  need  for  a  time  element  in  the  data 
warehouse  to  make  it  an  effective  tool  for  decision  support.  Time  variance  in  a  data 
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warehouse  shows  up  in  several  ways.  First,  data  warehouses  represent  data  over  many 
different  periods  of  time,  encompassing  years,  year-to-day,  months,  month-to-day,  weeks 
and  days.  Second,  the  index  key  structure  of  the  data  warehouse  in  all  cases  maintains  an 
explicit  time  dimension  whereas  operational  databases  are  more  likely  to  maintain  the 
time  element  on  an  implicit  basis.  The  difference  is  that  the  data  warehouse  will  maintain 
a  specific  time  element  as  a  part  of  the  index  key.  This  is  not  the  case  in  most  operational 
data  files  where  dates  may  be  associated  with  the  file  themselves  and  not  with  each  data 
element.  Third,  the  data  in  a  data  warehouse  are  a  series  of  snapshots  from  the 
operational  database  that  cannot  be  updated.  [Ref.  7:p.  8-9]  In  the  ARIES  context,  time  is 
a  less  important  factor  than  in  many  data  warehouses;  specifically,  time  is  visible  with 
respect  to  the  date  of  the  extraction,  and  therefore  the  user  is  aware  that  the  data  are 
assumed  to  be  accurate  as  of  a  specific  date. 

The  final  defining  characteristic  of  a  data  warehouse  is  nonvolatility.  This 
concept  rises  from  the  idea  that  a  data  warehouse  contains  a  snapshot  of  the  operational 
data  and  will  not  be  updated  in  a  traditional  sense.  The  only  real  functions  that  happen  in 
a  data  warehouse  are  the  action  of  loading  the  data  into  the  warehouse  and  any  actions 
accessing  that  data  for  the  purpose  of  analysis.  This  concept  provides  a  stable  platform 
upon  which  the  decision-maker  can  base  decisions.  The  use  of  a  separate  data  picture 
relieves  strain  on  the  operational  databases  from  what  would  otherwise  be  exhausting 
analytical  queries.  [Ref.  7:p.  10-12]  In  the  ARIES  project,  the  Administrator  is  the  agent 
that  allows  the  data  resource  for  the  ARIES  project  to  meet  this  criterion  of  a  data 
warehouse.  Each  extraction  of  the  source  files  is  a  snapshot  of  the  source  data  at  that 
time.  One  difference  between  the  ARIES  Administrator  and  “standard”  data  warehouses 
is  that  the  ARIES  data  resource  file  is  replaced  in  whole  rather  than  created  as  an  addition 
to  previous  extractions  whereas  a  true  data  warehouse  would  build  on  this  historical 
dataset  while  extracting  and  loading  new  data.  The  ARIES  data  resource  differs  from 
traditional  data  warehouses  because  it  is  designed  specifically  to  take  advantage  of  the 
spatial  aspects  of  the  underlying  data  sets.  The  use  of  a  multi-criteria  decision  model  to 
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determine  the  site  desirability  of  an  area  drove  the  need  to  orient  data  based  on  its 
geographic  content. 

“By  any  definition,  however,  a  comprehensive  data  warehouse  is  much  more  than 
archived  events  equipped  with  a  general  purpose  front-end  query  tool.”  [Ref.  8:p.  1]  As 
the  definition  of  data  warehousing  continues  to  develop,  applications  of  and  uses  for  data 
warehousing  will  continue  to  expand  and  become  more  prevalent. 

2.  Applications 

Traditional  data  warehouses  fall  into  two  categories,  either  Relational  Database 
Management  Systems  (RDBMS)  or  Multi-Dimensional  Databases  (MDDB).  Only  in  the 
past  few  years  has  data  warehousing  been  viewed  as  a  way  for  organizations  to  gain 
insight  about  the  information  embedded  in  their  operational  data  sets.  The  necessity  for 
operational  data  to  be  reorganized  and  structured  in  a  data  warehouse  architecture  is 
driven  by  the  need  to  maintain  acceptable  performance  and  integrity  levels  in  both  the 
operational  and  evaluational  data  sets. 

An  RDBMS  is  a  database  system  that  organizes  and  accesses  data  as  two- 
dimensional  rows  and  columns.  Data  are  organized  so  that  related  information  can  be 
accessed  using  Structured  Query  Language  (SQL).  Data  that  are  linked  together  with 
common  key  values  will  support  a  certain  level  of  data  integrity,  but  may  create  a  large 
amount  of  overhead  at  query  time  depending  upon  the  complexity  of  the  queries  required 
to  correlate  data  elements.  Using  RDBMSs  to  support  complex  analytical  processing  and 
decision  support  has  been  difficult.  The  performance  of  a  RDBMS  is  hindered  when  it  is 
forced  to  handle  the  complex  aggregation  type  queries  expected  in  a  data  warehouse 
environment.  Each  time  a  query  is  executed  it  must  aggregate  the  data  that  the  query  is 
seeking.  This  sometimes  involves  millions  of  records.  Until  new  technologies  are 
developed  and  tested,  RDBMS  alone  would  not  be  the  best  choice  for  a  data  warehousing 
project  that  involves  numerous  complex  queries. 
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An  MDDB  is  a  data  base  technology  that  represents  multi-dimensional  data  as 
aggregations  of  data  in  cells  that  are  the  intersection  of  multiple  dimensions.  A 
dimension  is  a  table  with  a  single-part  key  that  relates  directly  to  a  fact  table  that  in  turn 
relates  all  the  dimensions  in  a  star-like  structure  using  a  multi-part  key.  Figure  6  shows 
an  example  of  a  fact  table. 


Time  Dimension 


Product  Dimension 


Store^key 

StoreName 

Address 

FloorType 

etc- 


Fact 
Timejkey 
Productjcey 
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Register^key 
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Units  Sold 
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Customer  Dimension 
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PurchaseProfite 

etc. 


Figure  6.  Example  Fact  Table.  [Ref.  9:  Figure  2] 

The  fact  table  in  a  MDDB  is  used  to  traverse  the  data  across  multiple  attributes 
quickly  whereas  dimension  tables  contain  the  actual  descriptive  data.  [Ref.  9:p  2-3]  Data 
in  the  MDDB  will  be  stored  in  forms  that  facilitate  the  common  usage  patterns  of  users. 
Summarized  data  that  are  accessed  frequently  are  preprocessed  and  made  available  for  the 
user  to  query  upon  demand,  unlike  the  RDBMS  that  would  have  to  process  the  query 
dynamically  each  time  there  is  a  request  for  that  data.  This  allows  for  quick  retrieval  of 
predefined  calculations  and  efficient  results.  [Ref.  4:p.  6] 

The  ARIES  data  resource  file  that  was  generated  to  support  the  decision  goal  of 
relating  TPU  readiness  to  site  desirability  does  not  fit  either  of  the  two  traditional  data 
warehouse  types  described  above.  Because  the  complex  queries  involved  in  the  ARIES 
project  required  aggregating  data  elements  across  many  data  files,  the  use  of  a  relational 
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model  would  have  hindered  the  performance  of  the  evaluation  session.  The  queries  for 
each  decision  measure,  listed  in  Appendix  A,  did  not  require  a  multi-dimensional  analysis 
and  therefore  there  was  no  need  for  an  MDDB  model.  The  fact  that  the  17  different 
databases  represent  data  from  different  areas  did  not  allow  for  the  separation  of  the  data 
into  formal  dimensions.  The  geographical  nature  of  the  decision  goal  and  all  its  data  files 
having  a  spatial  aspect  qualifies  ARIES  as  a  special  kind  of  data  warehouse  called  a 
Spatial  Data  Warehouse.  The  term  “ spatially  enabled ’  is  used  to  describe  data  that  have 
this  spatial  or  geographical  component.  Spatial  enabling  allows  data  to  be  related  across 
locations,  boundaries  and  other  defined  lines  that  cannot  be  done  easily  in  the  traditional 
forms  of  data  warehousing. 


3.  Design  Concerns 

Marc  Demarest  defines  four  fundamental  goals  of  a  data  warehouse  that  serve  as 
the  basis  for  complex,  forward  looking  business  modeling: 

1.  To  protect  production  systems  from  query  drain  by  moving  query 
processing  onto  a  separate  system  dedicated  to  that  task,  and  extracting 
all  the  relevant  information  from  each  production  data  source  at 
predictable  times  when  off-peak  usage  patterns  prevail; 

2.  To  provide  a  traditional,  highly  manageable  data  center  environment 
for  DSS  using  tools  and  practices  comparable  to  those  used  in  data 
center  On-Line  Transactional  Processing  (OLTP); 

3.  To  build  a  Unified  Data  Architecture  (UDA)  or  Enterprise  Data  Model 
(EDM)  in  the  warehouse,  so  that  data  from  disparate  production 
systems  can  be  related  to  other  data  from  different  production  systems 
in  a  logical,  unified  fashion. 

4.  To  separate  data  management  and  query  processing  issues  from  end- 
user  access  issues  so  that  they  can  be  treated  as  distinct  problems.  [Ref. 

6:p.  4] 

These  goals  provided  the  foundation  for  organizations  to  begin  leveraging  huge  amounts 
of  data  they  have  maintained  for  years  to  gain  a  competitive  advantage.  They  also 
provide  a  sound  basis  to  begin  the  development  of  a  data  warehouse  project  but  these 
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goals  fall  short  in  anticipating  the  dynamic  nature  of  contemporary  organizational 
computing.  The  last  two  goals  prove  to  be  the  biggest  drawbacks  in  this  respect. 

Developing  an  in-depth  UDA  or  EDM  requires  extensive  resources  and  time  to 
complete.  Many  businesses  do  not  have  sufficient  physical  or  financial  assets  to  devote 
to  a  project  that  may  not  deliver  results  for  a  relatively  long  period  of  time.  Meanwhile, 
the  rapidity  of  changes  in  business  requirements  will  inevitably  cause  the  EDM  to 
undergo  continuous  renovation.  As  a  result,  these  renovations  may  very  likely  be  costly 
without  providing  timely  responses  to  changes  in  the  analysis  needs  of  the  user 
communities. 

The  fourth  goal  of  a  data  warehouse  focuses  on  the  data  management  and 
querying  process,  and  maintaining  these  functions  separate  from  the  access  available  to 
the  end-user.  This  goal  is  based  on  the  need  to  perform  these  large-scale  functions  in  a 
mainframe  based  application  environment.  The  performance  of  current  client/server 
desktop  systems  has  put  computing  power  more  directly  in  the  hands  of  the  end-users 
who  can  handle  portions  of  these  tasks.  The  ability  of  the  user  to  manipulate  and  analyze 
data  directly  is  required  in  today’s  dynamic  business  environment.  [Ref.  6:p.  4] 

These  shortcomings  in  data  warehouse  design  architecture  gave  rise  to  a  more 
flexible  and  less  expensive  solution  to  organizations’  data  analysis  needs.  In  1991  the 
Forrester  Research  firm  declared  data  warehousing  dead  and  replaced  it  with  a  term  they 
called  data  marting.  [Ref.  6:p.  5]  The  next  section  will  discuss  the  differences  between 
data  marting  and  data  warehousing. 

B.  DATA  MARTS 

The  terms  themselves  suggest  that  the  difference  between  a  data  warehouse  and  a 
data  mart  would  be  in  the  size  of  data  maintained.  The  difference  in  size  may  be  true  in 
most  cases  but  more  significant  differences  lie  in  the  application  and  implementation  of 
the  project.  Table  3  highlights  some  of  the  major  differences  between  a  data  warehouse 
and  a  data  mart. 
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Data  Warehouse  and  Data  Marts:  What’s  the  difference? 


Typically 

Data  Warehouse 

Data  Miart 

-it  addresses: 

many  subject  areas,  perhaps 
the  entire  enterprise 

a  subject  area 

-  it  is  sized  at: 

GigaByte(GB)  to 
TeraByte(TB) 

MegaByte(MB)  to  Low 
GigaByte(GB) 

-  it  is  accessed  by: 

business  analysts  and 
front-line  users 

business  analysts  and 
front-line  users 

-  it  is  implemented  in: 

years 

months 

-  it  costs: 

$  millions 

$  tens  or  hundreds  of 
thousands 

Table  3.  Data  Warehouse,  Data  Mart  differences.  [Ref.  10:p.  9] 


1.  Definition 

A  Data  Mart  is  a  decision  support  database  application  that  provides  decision¬ 
making  solutions  for  a  narrowly  specified  group  of  knowledge  workers.  The  data  mart 
focuses  on  the  needs  of  the  knowledge  worker  and  discounts  the  underlying  production 
systems  in  an  effort  to  provide  a  DSS  solution  for  the  workers.  This  focused  approach  is 
achieved  by  keeping  the  data  mart  oriented  to  one  subject  area  versus  the  multiple 
organization  wide  approach  of  a  data  warehouse. 

Data  marts  are  more  appealing  to  the  business  community  today  because  of  the 
reasons  mentioned  in  Table  3  (e.g.,  size,  implementation  time,  and  cost).  The  smaller 
size  of  the  data  mart  compared  to  the  data  warehouse  allows  the  information  to  be 
available  to  more  users  in  the  distributed  desktop  environment  that  characterizes  today’s 
business  world.  [Ref.  ll:p.  1-2]  This  philosophy  allows  the  use  of  systems  that  are 
already  in  place  on  decision-makers’  desks  to  conduct  detailed  decision  analysis  without 
the  requirement  of  investing  in  large  amounts  of  hardware. 

The  lag  time  between  implementation  of  a  project  and  output  of  some  useful 
results  is  an  important  issue  in  the  overall  success  of  a  project.  Josh  Bersin,  Group 
Director  of  Data  Warehouse  Solutions  at  Sybase,  Inc.  indicates  that  a  data  mart  must 
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deliver  results  in  the  first  90  days.  [Ref.  1 1  :p.  2]  Because  a  data  mart  solution  can  be 
designed  and  implemented  in  a  fraction  of  the  amount  of  time,  i.e.,  months  versus  years, 
it  is  able  to  adjust  more  rapidly  to  the  changing  business  environment.  Data  marts  should 
be  designed  with  the  concept  of  expandability  in  mind  because,  as  users  explore  the 
information  available,  they  will  want  to  look  at  the  data  in  ways  for  which  it  was  not 
originally  intended.  The  capacity  of  the  data  mart  to  be  flexible  and  adjust  to  the  user 
provides  the  additional  feature  of  scalability. 

The  ability  of  an  organization  to  implement  a  data  mart  quickly  using  existing 
hardware  infrastructure  provides  an  immediate  cost  benefit.  In  today’s  business  world 
where  every  dollar  expended  is  scrutinized  closely,  it  is  important  to  provide  business 
solutions  that  offer  a  competitive  advantage  at  a  minimum  cost.  Data  marts  provide  this 
advantage  in  their  specific  subject  area.  It  is  important  for  the  organization  to  ensure  that 
data  marts  are  not  built  in  a  vacuum  and  that  each  data  mart  is  designed  with  the 
enterprise  wide  data  model  in  mind.  This  will  prevent  the  proliferation  of  stovepipe 
systems.  [Ref.  ll:p.  1-2] 

Marc  Demarest  first  discussed  the  concept  of  integration  of  data  marts  across  the 
organization  in  1993  when  he  recommended  his  solution  to  the  enterprise- wide  decision 
support  problem.  Instead  of  using  only  a  data  warehouse  or  data  mart  he  recommended 
the  use  of  a  hybrid  data  architecture.  Demarest’ s  thinking  was  ahead  of  its  time  and  he 
was  the  target  of  some  scathing  criticism  for  suggesting  the  combination  of  the  two 
philosophies.  In  his  article  “Building  The  Data  Mart”,  he  laid  down  an  architectural 
model  for  combining  a  single  warehouse  and  multiple  data  marts  into  one  integrated 
enterprise  decision  tool  that  has  become  one  of  the  most  popular  designs  of  enterprise¬ 
wide  decision  support.  [Ref.  6:p.  1] 

The  ARIES  project  data  resource  in  many  ways  fits  the  definition  of  a  data  mart. 
It  was  implemented  in  a  matter  of  months,  developed  on  a  limited  budget,  and  was 
designed  to  support  a  specific  decision  process  for  front-line  users.  However,  in  other 
ways  it  does  not  fit  the  traditional  form  of  a  data  mart.  The  ARIES  data  resource  file  is  a 
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collection  of  resources  that  has  been  aggregated  and  manipulated  to  take  advantage  of  the 
spatial  aspects  of  each  data  set.  The  data  files  are  not  maintained  in  one  of  the  traditional 
formats  of  a  data  mart  (i.e.,  relational  or  multi-dimensional),  but  rather  are  maintained  as 
separate  tables  that  will  provide  data  for  each  of  the  twenty  decision  factors  in  the  most 
expeditious  way.  The  need  to  gain  performance  speed  during  the  querying  process  forced 
the  use  of  many  geographic  queries  that  quickly  localize  data  by  the  spatial  elements  that 
are  already  present  in  the  data  sets. 

2.  Applications 

Data  marts  fall  into  similar  categories  as  the  data  warehouse  that  are  based  on  the 
intended  use  and  types  of  data  that  are  to  be  manipulated.  The  two  categories  are  based 
on  the  same  design  principals  as  discussed  earlier  in  this  chapter  for  data  warehouses, 
multi-dimensional  and  relational.  A  choice  between  these  two  design  architectures  is 
based  on  the  type  of  analysis  to  be  done  as  well  as  the  type  of  data  to  be  analyzed. 

MDDM  data  marts  are  used  to  look  analytically  at  the  same  data  in  different 
ways.  They  maintain  large  amounts  of  numeric  data  such  as  sales  data.  Once  the  data  are 
loaded,  either  from  the  data  warehouse  or  from  external  sources,  it  is  maintained  in  a  very 
structured  framework.  MDDM  data  marts  are  most  effective  for  analyzing  numeric  data 
in  an  ad  hoc  manner.  This  approach  to  analytical  processing  for  decision  support  is  called 
on-line  analytical  processing  (OLAP).  [Ref.  12:p.  4] 

The  relational  data  mart  uses  a  form  of  analysis  processing  known  as  Relational 
On-line  Analytical  Processing  (ROLAP).  ROLAP  data  marts  support  a  much  wider 
range  of  purposes  for  numeric  and  textual  data  and  therefore  allow  for  the  use  of  a  more 
general  purpose  decision  tool  than  the  MDDM  counterparts.  They  provide,  through  the 
use  of  relational  technology,  the  ability  to  conduct  both  disciplined  repetitive  queries  and 
ad  hoc  usage.  The  data  mart  concept  has  its  foundation  in  providing  the  knowledgeable 
user  with  the  decision  support  tool  that  fits  their  needs  and  provides  access  to  just  the  data 
that  the  user  needs  to  see. 


32 


3. 


Design  Concerns 


In  designing  a  data  mart  the  philosophies  are  founded  on  the  idea  of  an  application 
being  user-oriented  in  nature.  The  system  is  designed  to  provide  the  smaller  set  of  users 
with  the  exact  data  set  they  are  going  to  be  using  versus  an  enterprise  wide  set  of  data  for 
possible  decision  concerns.  This  concept  provides  for  a  somewhat  different  set  of 
processes  to  be  conducted  during  the  design  phase.  Marc  Demarest  identified  four 
distinct  processes:  [Ref.  6:p.  8] 

•  Extract  all  data  relevant  to  the  business  decision-making  of  the  groups  of 
knowledge  workers 

•  Store  the  resulting  data  sets  in  one  location:  the  data  warehouse. 

•  Create  a  unique  cut  or  series  of  cuts  of  the  data  warehouse  for  each  knowledge 
worker  community.  These  are  the  “data  marts”. 

•  Supply  the  decision-support  tools  appropriate  to  the  knowledge  workers’  style 
of  computing. 

The  extraction  process  involves  the  translation  of  the  data  to  standard  formats, 
scrubbing  the  data  for  anomalies,  and  copying  only  the  data  elements  required  for 
decision-making.  This  extraction  process  creates  a  subset  of  the  operational  data  set 
known  as  a  “cut.”  Storing  the  data  in  one  location  provides  a  big  picture  of  the  business 
for  the  major  processes  in  the  organization.  Because  different  users  throughout  an 
organization  apply  different  aspects  of  the  data  it  is  important  to  provide  each  group  of 
users  with  a  specific  cut  of  the  data  warehouse  for  their  use.  This  is  done  with  use  of  data 
marts.  Finally  and  most  importantly,  decision-support  tools  must  be  provided  that  match 
the  skill  sets  of  each  group  of  users.  If  the  user  is  unable  to  access  the  resource,  it  is  not 
an  asset  but  a  liability. 

The  ARIES  project  conforms  to  these  four  distinct  processes.  All  pertinent  data 
are  extracted  to  a  single  location.  The  extracted  data  are  intended  for  use  by  a  specific 
group  of  users  and  in  the  format  that  USARC  designed.  The  data  are  provided  through  a 
powerful  decision-support  tool  that  provides  extensive  functionality  for  the  common  user 
as  well  as  detailed  analysis  capabilities  for  the  knowledgeable  user. 
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Data  warehouses  and  data  marts  have  the  same  foundation.  The  data  they 
represent  is  the  detailed  data  maintained  in  operational  transactional  data  files.  Therefore, 
the  quality  of  the  data  retrieved  from  the  data  warehouse  or  data  mart  is  directly  related  to 
the  quality  of  the  data  in  the  underlying  operational  data  sets.  The  next  section  will 
discuss  the  issue  of  data  quality  as  it  relates  to  the  data  warehouse  environment. 


C.  DATA  QUALITY 

The  success  of  a  data  warehouse  or  data  mart  project  can  not  be  insured  by  the 
best  user  interface  or  the  newest  database  technologies  if  the  underlying  data  is  incorrect. 
The  quality  of  the  data  obtained  for  use  in  any  decision  support  tool  becomes  a  hazard 
that  the  users  must  recognize.  It  must  be  managed  during  the  design,  development,  and 
implementation  phase  of  any  project.  Consider  the  following  examples: 

•  Inaccurate  data  related  to  categorization  of  bank  customers  resulted  in 
erroneous  risk  exposure  estimates,  leading  the  bank  to  believe  it  was 
more  diversified  than  it  was.  When  the  oil  market  softened  in  Texas, 
banks  having  a  large  number  of  Texan  accounts  suffered  a  major  loss 
because  of  the  inaccurate  representation  of  risk.  [Ref.  13:p.  1] 

•  A  senior  level  military  officer  was  defending  the  defense  budget  before 
the  U.S.  Senate.  When  questioned  about  a  discrepancy  in  the  number 
of  authorized  officers  shown  in  the  proposed  budget  versus  the 
congressional  numbers,  no  one  could  explain  the  discrepancy.  That  day 
the  military  lost  2,500  authorized  officers  it  needed  because  Congress 
liked  the  lower  number.  It  turns  out  that  a  data  timeliness  problem 
within  one  of  the  data  warehouse  source  systems  were  the  reason  for 
the  discrepancy.  [Ref.  13:p.  2] 

These  examples  show  the  necessity  of  evaluating  the  value  of  the  data  that  a  decision¬ 
maker  is  using  to  make  important  decisions.  This  section  will  define  data  quality, 
identify  the  attributes  that  make  up  the  quality  of  data,  and  discuss  the  elements  that  have 
the  greatest  effect  on  data  integrity. 
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1. 


Definition 


With  the  operative  word  in  data  warehousing  and  data  marting  being  “data,”  the 
adequacy  of  the  underlying  data  used  to  build  the  data  warehouse  must  be  determined. 
“ Data  Quality”  is  the  term  used  to  identify  and  manage  the  effects  of  inadequacies  of  the 
data  in  a  decision-support  environment.  Defining  data  quality  and  maintaining  a  level  of 
data  quality  is  a  difficult  task.  It  is  a  common  theme  throughout  the  literature  that 
determining  the  quality  level  of  data  used  for  decision-making  is  important  to  the 
eventual  success  of  any  data  warehouse  project. 

Data  Quality  is  defined  by  Ken  Orr,  of  the  Ken  Orr  Institute,  “as  the  measure  of 
the  agreement  between  the  data  views  presented  by  an  information  system  and  that  same 
data  in  the  real-world.”  [Ref.  14:p.  2]  Richard  Wang  and  Yair  Wand  define  data  quality 
as  “a  multi-dimensional  concept  made  up  of  dimensions  like  accuracy,  completeness, 
consistency,  and  timeliness.”[Ref.  15:p.  87]  Michael  Brackett  states  that  data  quality  is 
an  indication  of  how  well  data  in  the  data  warehouse  meet  with  the  business  information 
demand  and  includes  data  integrity,  data  accuracy,  and  data  completeness.  [Ref.  3:p.  144] 
Duane  Hufford  says  “data  quality  is  the  state  of  completeness,  validity,  consistency, 
timeliness  and  accuracy  that  makes  data  appropriate  for  a  specific  use.”  [Ref.  13:p.  1] 

It  is  easy  to  identify  several  common  elements  in  the  attributes  that  all  of  these 
individuals  believe  make  up  data  quality.  For  the  purposes  of  this  discussion  I  will  use 
the  definition  that  is  published  by  the  Defense  Information  Systems  Agency  (DISA)  in 
the  DoD  Guidelines  on  Data  Quality  Management  that  identify  six  characteristics  of  data 
quality:  accuracy,  completeness,  consistency,  timeliness,  uniqueness,  and  validity.  [Ref. 
16:p.  2]  Table  4  gives  a  description  and  an  example  of  each  of  these  six  characteristics. 
The  application  of  the  use  of  these  six  characteristics  provide  a  sound  foundation  for  any 
organization  to  begin  the  process  of  identifying  a  confidence  level  for  the  quality  of  data 
that  are  being  used  for  decision-making.  These  characteristics  are  present  in  most  legacy 
databases  in  some  form,  which  makes  any  data  warehouse  to  which  this  information  is 
migrated  susceptible  to  the  same  types  of  errors  present  in  the  legacy  system.  “One  data 
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Data  Quality 
Characteristics 

Description 

Example  Metric 

Accuracy 

A  quality  of  that  which  is  free  of 
error.  A  qualitative  assessment 
of  freedom  from  error,  with  a 
high  assessment  corresponding 
to  a  small  error. 

Percent  of  values  that  are  correct 
when  compared  to  a  the  actual 
value.  For  example,  M=Male 
when  the  subject  is  Male. 

Completeness 

The  degree  to  which  values  are 
present  in  the  attributes  that 
require  them. 

Percent  of  data  fields  having 
values  entered  into  them. 

Consistency 

A  measure  of  the  degree  to 
which  a  set  of  data  satisfies  a  set 
of  constraints. 

Percent  of  matching  values  across 
tables/files/records. 

Timeliness 

It  represents  the  degree  to  which 
specified  data  values  are  up  to 
date  with  the  real-world. 

Percent  of  data  available  within  a 
specified  threshold  time  frame(e.g., 
days,  hours,  minutes) 

Uniqueness 

The  state  of  being  the  only  one 
of  its  kind.  Being  without  an 
equal  or  equivalent. 

Percent  of  records  having  a  unique 
primary  key. 

Validity 

The  quality  of  data  that  is 
founded  on  an  adequate  system 
of  classification  and  is  rigorous 
enough  to  compel  acceptance. 

Percent  of  data  having  values  that 
fall  within  their  respective  domain 
of  allowable  values. 

Table  4.  Data  Quality  Characteristics.  [Ref.  16:p.  2] 


manager  for  a  large  company  reported  that  fully  60%  of  the  data  that  was  transferred  to 
their  data  warehouse  failed  to  pass  the  business  rules  that  the  systems  operators  said  were 
in  force.”  [Ref.  14:p.  6] 

Because  the  recipients  of  this  “dirty  data”  are  the  resources  that  organizations  are 
using  to  make  “fact  based”  decisions  from,  it  is  important  to  understand  data  quality  as  it 
relates  to  the  project.  Organizations  create  data  warehouses  and  DSSs  to  avoid  making 
inaccurate  assumptions  about  their  business.  They  should  then  not  make  assumptions 
about  what  makes  up  the  data  set  used  to  load  the  data  warehouse.  [Ref.  17:p.  1]  Creating 
a  plan  to  improve  data  quality  is  part  of  understanding  that  legacy  data  itself  will  not  meet 
the  quality  standards  required  to  make  decisions. 
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2. 


Improving  Data  Quality 


The  process  of  improving  data  quality  is  an  incremental  process  that  can  be 
conducted  in  all  phases  of  a  project  from  birth  to  implementation  and  beyond. 

Although  database  consultant  Ken  Orr  has  called  data  warehouses  the 
sewage  treatment  plant  of  enterprise  data,  this  is  not  the  objective  of  data 
warehousing.  It  is,  unfortunately,  the  unintended  result  of  loading  legacy 
data  that  has  not  been  subjected  to  data-quality  improvement.  [Ref.  18:p. 

1] 

To  improve  the  quality  of  data  in  any  system  you  must  first  determine  the  baseline  quality 
of  the  current  data  set  must  first  be  determined.  This  is  done  by  comparing  actual  data 
instances  against  the  established  rule  sets  that  have  been  established.  The  rule  sets  will 
become  the  metrics  like  those  shown  in  Table  4  and  will  be  documented  in  the  meta-data 
of  the  data  files  for  future  reference.  If  these  rules  are  not  documented,  the  first  step  in 
improving  data  quality  is  to  document  the  business  rules  that  the  current  data  represent. 
An  important  part  of  this  initial  assessment  is  to  identify  the  responsible  stakeholder  or 
stakeholders  and  to  get  them  involved  in  the  improvement  process.  Once  the  data  quality 
baseline  assessment  is  complete  the  next  step  is  to  determine  and  document  the  level  of 
quality  required  by  the  intended  business  use. 

There  will  be  different  levels  of  quality  required  for  data  depending  on  the 

intended  use.  Ken  Orr  in  his  discussion  of  Data  Quality  and  Systems  Theory  states  that: 

No  serious  information  system  has  a  data  quality  of  100%.  The  real 
concern  with  data  quality  is  to  insure  not  that  the  data  quality  is  perfect, 
but  that  the  .  quality  of  the  data  in  our  information  system  is  accurate 
enough,  timely  enough,  and  consistent  enough  for  the  organization  to 
survive  and  make  reasonable  decisions.  [Ref.  14:p.  3] 

A  system  that  is  used  to  make  life  threatening  decisions  will  have  need  for  higher  quality 

data  than  system  used  to  identify  the  placement  of  a  new  facility.  This  required  quality 

level  or  the  “ enough  ”  that  Orr  speaks  of  can  only  be  determined  by  the  users  of  the 

system.  Once  the  user  determines  the  level  of  quality  required,  it  becomes  the  goal  of  the 

project  to  attain  that  level.  Part  of  attaining  that  level  is  to  have  in  place  the  mechanisms 

to  measure  and  maintain  that  quality  over  the  life  of  the  project.  Where  possible  this 
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should  be  done  at  the  origination  point  of  the  data  instance.  Larry  English  states  that  “to 
improve  warehouse  data  quality  you  must  improve  the  business  processes  that  produce 
the  data.”  [Ref.  16:p.  4]  If  this  is  not  possible,  quality  controls  must  be  put  in  place  in  the 
migration  path  of  the  data  from  the  operational  source  to  the  data  warehouse  or  data  mart. 
This  process  is  known  as  ""data  cleansing .”  Because  the  source  files  that  were  used  in  the 
ARIES  project  are  not  in  direct  control  of  the  customer,  the  process  of  data  cleansing  was 
used  extensively  in  the  development  of  the  ARIES  data  resource  file. 

3.  Data  Migration 

The  operation  of  moving  data  or  loading  it  into  the  data  warehouse  is  not  a  simple 
task  and  requires  substantial  planning.  As  pointed  out  previously  in  this  chapter,  if  you 
simply  move  the  data  from  the  operational  system  into  the  warehouse  you  are  moving  the 
data  problems  as  well.  The  process  of  migrating  the  data  therefore  becomes  a  point  at 
which  problems  with  the  legacy  data  can  be  identified  and  possible  solutions  can  be 
developed. 

The  migration  process  involves  a  method  to  transfer  the  data  to  the  target  data 
warehouse,  transformation  of  the  source  data  into  the  data  warehouse  architecture,  and  a 
method  to  clean  or  scrub  data  problems.  Figure  7  is  a  diagram  of  the  migration  process 
that  shows  the  flow  of  legacy  data  into  the  target  data  warehouse. 


Figure  7.  Data  Migration  Process 
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The  perfect  migration  process  would  involve  continuous  measurement  of  the  data 
being  transferred  against  the  business  rule  sets  that  is  in  place  and  has  the  ability  to 
provide  feedback  in  the  form  of  a  control  system.  The  concept  is  discussed  both  by  Ken 
Orr  in  his  paper  on  Data  Quality  and  Systems  Theory  [Ref.  14]  and  in  the  DoD 
Guidelines  for  Data  Quality  that  is  published  by  DISA  [Ref.  16]. 

The  function  of  this  migration  process  is  handled  by  the  ARIES  Administrator  in 
the  ARIES  project.  The  Administrator  maintains  the  business  rules  that  USARC 
developed  and  conducts  the  transfer  of  data  from  the  source  files  to  the  target  data 
resource  file.  It  does  not  contain  the  ability  to  ascertain  the  quality  of  the  data  being 
transferred  or  measure  that  quality  against  any  form  of  metrics.  This  process  must  be 
conducted  outside  of  the  automated  migration  process  provided  in  the  ARIES  SDSS 
project. 

D.  CHAPTER  SUMMARY 

The  philosophies  and  technology  involved  in  the  data  warehousing  process  are 
currently  in  their  infancy.  The  concept  is  growing  rapidly  as  is  shown  by  its  wide 
acceptance  in  the  leading  business  communities.  As  long  as  the  concept  of  subject- 
oriented,  integrated,  time-variant,  and  nonvolatile  data  collections  continues  to  provide 
businesses  with  a  competitive  advantage,  the  data  warehouse  will  be  at  the  center  of  the 
enterprise  decision-making  tool  set.  Data  marts  have  already  proven  to  be  an  acceptable 
way  to  provide  a  specific  cut  of  data  to  a  particular  group  of  users  with  the  an  intent  of 
providing  greater  accessibility.  The  use  of  data  warehouses  and  data  marts  in  a  common 
architecture  will  provide  even  greater  access  to  the  enterprise  data.  Greater  access  will 
give  rise  to  new  uses  for  that  data  and  allow  the  users  to  maximize  their  use  of  the 
available  data. 

The  ability  of  an  organization  to  take  advantage  of  “spatially  enabled”  data  is  a 
new  concept  that  requires  much  more  research  and  development.  Data  that  are  spatially 
enabled  will  begin  to  link  data  that  otherwise  had  no  common  link  and  will  allow  the 
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organization  to  fine  tune  their  decision-making  on  a  new  level.  The  ARIES  project  is  a 
working  prototype  of  an  application  that  has  made  every  attempt  to  maximize  the  spatial 
elements  of  the  source  data.  This  has  allowed  the  automation  of  a  decision  process 
previously  thought  to  be  too  complex  to  be  handled  in  a  client/server  environment. 

One  aspect  of  the  data  warehousing  and  data  marting  concept  that  is  consistent,  as 
long  as  the  data  being  warehoused  is  legacy  data,  is  the  need  to  determine  the  quality 
level  of  the  source  data  sets.  “Data  quality”  is  defined  by  six  characteristics  of  data: 
accuracy,  completeness,  consistency,  timeliness,  uniqueness,  and  validity.  The  quality  of 
the  data  that  are  being  used  can  be  improved  during  all  phases  of  a  project.  The 
improvement  process  involves  identifying  ownership  of  the  source  data,  conducting  a 
data  quality  baseline  assessment,  and  determining  the  required  level  of  quality  for  the 
system.  For  the  quality  improvement  plan  to  be  effective  it  must  provide  a  mechanism  of 
providing  feedback  to  the  data  systems  for  the  life  cycle  of  the  system.  This  continued 
quality  assessment  commonly  takes  place  during  the  migration  process.  This  process 
allows  the  data  to  be  transferred,  transformed,  and  scrubbed  to  meet  the  requirements  of 
the  data  warehouse  architecture. 

Chapter  IV  will  discuss  the  how  the  quality  level  of  the  source  data  files  for  the 
ARIES  SDSS  project  were  determined  and  identify  the  anomalies  that  were  identified. 
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IV.  ARIES  SDSS  APPLICATION:  PROBLEM  EVALUTION 


The  ARIES  SDSS  prototype  was  developed  in  direct  response  to  the  requirements 
set  forth  by  the  intended  users  at  US  ARC.  The  final  prototype  is  a  result  of  continuous 
adaptations  to  changing  user  requirements,  usability  improvements,  and  solutions  to 
system  design  and  implementation  problems.  Problems  were  encountered  in  several 
areas  during  the  development  process;  business  rule  development,  query  performance, 
and  data  anomalies.  This  chapter  discusses  specific  examples  of  problems  from  each  of 
these  areas  as  well  as  the  methods  used  to  overcome  them. 


A.  BUSINESS  RULE  DEVELOPMENT  PROBLEMS 

The  business  rules  of  the  SDSS  are  the  links  that  connect  the  hierarchical  decision 
goals  with  actual  data  instances  in  source  data  files.  The  business  rules  of  the  ARIES 
project  decision  measures  are  described  in  detail  and  listed  in  Appendix  A.  An  example 
of  an  ARIES  SDSS  business  rule  is  the  following: 

Area  Loss  Rate  Is  equal  to  the  number  of  losses  to  units  in  the  area 
during  the  previous  fiscal  year  divided  by  the  total 
number  of  reservists  currently  assigned  to  those  units. 

Losses  are  determined  by  counting  the  entries  in  the 
FYxxLOSS  file  where  the  data  element  “TRMN”  equals 
“LOSS”  that  are  associated  with  each  UIC  in  the  area. 

The  total  number  of  assigned  reservists  is  determined  by 
counting  all  of  the  personnel  records  in  the  G18CWE 
file  associated  with  each  UIC  in  the  area.  “In  the  area” 
is  defined  as  within  a  50  mile  radius  of  the  proposed  site. 

Some  of  business  rules  documented  during  the  project  development  phase  were 
flawed  in  their  initial  assumptions.  The  problems  encountered  fall  into  two  main 
categories:  errors  in  logic  and  rules  not  support  by  data.  The  next  sections  will  discuss 
examples  of  these  problems. 
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1. 


Errors  in  Logic 


An  example  of  a  logic  problem  occurred  with  the  requirement  by  seven  of  the  20 
measures  to  identify  all  units  “in  the  area”  of  a  proposed  facility.  The  business  rule 
developed  by  the  expert  panel  at  USARC  identified  a  file  called  COMMAND  PLAN  as 
the  best  source  from  which  to  obtain  this  information.  This  was  based  on  the  assumption 
that  COMMAND  PLAN  contained  entries  for  every  unit  and  facility  in  the  Army 
Reserve.  After  reviewing  the  actual  data  entries  in  the  file,  however,  it  was  found  that  the 
file  actually  contained  multiple  entries  for  each  site,  including  information  on  closed  and 
proposed  sites  as  well  as  active  sites.  It  was  therefore  impossible  to  identify  a  list  of  valid 
units  or  facilities  solely  from  this  data  source.  When  the  system  was  run  under  this 
assumption  the  program  failed  because  of  multiple  entries  with  same  values  in  a  key 
index  field. 

It  was  then  determined  that  this  information  must  be  derived  from  a  complex 
query  that  matched  the  entries  in  COMMAND  PLAN  against  the  G17  source  file  that 
lists  basic  facility  information  (Table  5).  This  query,  referred  to  as  VALIDJJIC,  creates  a 
table  of  UICs  that  is  created  and  only  stored  in  the  ARIES  SDSS  data  resource  file. 

Because  the  COMMAND  PLAN  source  file  contains  historic,  present,  and  future  entries 
VALIDJJIC 

SELECT  UIC,  FAC JD,  UnitName,  City,  State,  Zip 
FROM  G17Natl 

WHERE  G1 7Natl.UIC  =  ANY  (SELECT  CMDPLAN.UIC 

FROM  CMDPLAN) 

Table5.  VALID  UIC  Query 

for  each  site,  a  filter  query  was  added  to  the  extraction  process  in  the  Administrator  to 
load  only  data  instances  from  COMMAND  PLAN  that  are  valid  over  the  next  13  months, 
as  shown  in  Table  6.  The  combination  of  these  two  queries  solved  the  problem  of 
identifying  valid  Army  Reserve  units. 
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CMDPLAN 

SELECT  DISTINCT  UIC,  FACID  AS  FAC_ID,  EDATE 
FROM  COMMANDPLAN 

WHERE  (FACID  o  “N/A”)  AND  (FACID  o  “TBD”)  AND 
(LEN(FACID)  >  2)  AND  ((LEFT (ED ATE, 4)  =  ‘CCYY’ 
AND  MID(EDATE,5,2)  <=  “MM”)  OR  (LEFT (ED ATE, 4) 
<=  ‘CCYY’ 

ORDER  BY  UIC,  EDATE  DESC 
INTO  CMDPLAN 

INDEX  On  UIC  As  UIC 

Note:  Application  automatically  adjusts  the  dates  to  obtain  a  13-month 
window. 

Table  6.  COMMAND  PLAN  Filter  Query 


2.  Rule  not  Supported  by  Data 

An  example  where  underlying  data  did  not  support  the  business  rule  was 
uncovered  while  determining  the  value  for  backlogged  maintenance  actions  of  a  facility. 
Measure  #1  in  Appendix  A.  The  original  business  rule  applied  a  criterion  of  totaling  only 
the  “K-account”  unfunded  requests,  identified  by  the  fund  code  of  “BMAR”.  After 
reviewing  the  data  file  and  attempting  to  implement  this  business  rule,  it  was  discovered 
that  only  a  small  number  of  facilities  had  any  entries  with  the  BMAR  fund  code.  Because 
a  value  of  zero  for  backlogged  maintenance  receives  the  maximum  utility,  as  shown  in 
Appendix  A,  this  error  would  have  seriously  overestimated  the  contribution  of 
backlogged  maintenance  to  site  desirability  and  potentially  biased  the  outcome  of  the 
decision  model  significantly.  This  problem  was  solved,  after  discussions  with  US  ARC, 
by  totaling  all  the  unfunded  requests  without  concern  for  the  fund  code. 


B.  QUERY  PERFORMANCE  PROBLEMS 

The  user  did  not  care  how  long  the  evaluation  took  as  long  as  it  was  an  automated 
process.  The  ARIES  SDSS  project  requirements  offered  little  insight  about  the  expected 
computer  execution  time  a  site  evaluation  would  take.  During  the  beginning  phases  of 
prototype  development,  query  times  in  excess  of  one  and  half  hours  were  common.  This 
long  evaluation  time,  though  still  many  times  faster  than  the  current  manual  process,  was 
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considered  unacceptable  for  an  automated  decision  support  implementation  by  the 
development  team.  Furthermore,  the  source  files  in  use  during  the  beginning  phase  only 
contained  data  for  the  state  of  Pennsylvania.  Extending  the  data  sets  to  the  national  level 
promised  some  truly  staggering  execution  times.  Efforts  were  undertaken  immediately  to 
streamline  the  lengthy  query  by  focusing  on  two  areas  of  the  querying  process:  (1)  the 
use  of  geo-queries  in  place  of  standard  SQL  queries  and,  (2)  aggregation  of  detailed 
information  into  smaller  data  sets. 

1.  SQL  vs.  Geo-Query 

After  reviewing  each  measure  in  detail,  it  was  determined  that  14  of  the  20 
decision  measures  were  dependent  on  a  spatial  element  query  (e.g.,  Number  of  Reservists 
within  50  miles).  Because  the  application  was  using  a  GIS  system  already,  a  natural 
course  of  action  was  to  leverage  the  powerful  geocoding  abilities  of  Maplnfo™  to 
conduct  spatial  queries  as  a  way  to  reduce  the  overall  query  time.  The  Maplnfo™  queries 
executed  three  to  four  times  faster  than  conducting  the  same  query  through  standard  SQL. 
Geocoding  a  source  file  allows  Maplnfo™  to  localize  the  desired  records  and  only  look  at 
a  subset  of  the  data  file.  A  counterpart  SQL  command,  on  the  other  hand,  would  attempt 
to  match  each  record  in  the  data  file.  The  processes  of  passing  the  queries  to  Maplnfo™ 
reduced  the  evaluation  time  from  hours  to  less  than  ten  minutes. 

Another  obvious  advantage  to  using  Maplnfo™  to  conduct  other  queries 
involving  a  list  of  items  with  a  specified  area  of  a  geographical  location.  This  feature  was 
also  used  to  determine  the  following  lists  for  use  elsewhere  in  the  application:  units 
within  50  miles,  facilities  within  50  miles.  Army  reservists  within  50  miles,  members  of 
the  National  Guard  within  50  miles,  Individual  Ready  Reserve  (IRR)  individuals  within 
50  miles,  and  a  list  of  zip  codes  within  50  miles. 
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2. 


Detail  vs.  Aggregation 


One  of  the  factors  that  led  the  development  team  to  the  incorporation  of  data 
warehouse  concepts  in  this  project  was  the  requirement  which  surfaced  during 
development  to  improve  query  performance.  As  discussed  in  Chapter  III,  a  data 
warehouse  is  an  optimized  data  store  used  to  provide  data  in  a  structure  or  format  that 
will  maximize  the  performance  of  the  DSS  tool  set.  Aggregating  the  detailed  information 
in  source  files  into  summarized  tables  is  one  of  the  techniques  use  to  increase 
performance.  The  use  of  aggregation  to  improve  performance  was  an  idea  that  was  used 
heavily  in  developing  the  ARIES  data  resource.  Because  a  number  of  the  queries 
required  counting  the  records  that  match  a  particular  criterion,  the  concept  of  aggregation 
provided  an  obvious  advantage. 

An  example  of  the  benefits  of  aggregation  can  be  seen  in  the  counting  of  the 
number  of  individual  reservists  assigned  to  a  list  of  units.  The  original  process  would 
have  conducted  a  complex  query  that  counted  the  number  of  entries  in  the  G18CWE 
personnel  file  that  had  a  UIC  matching  any  one  of  the  UICs  determined  to  be  in  the  area 
of  the  proposed  site.  The  G18CWE  data  file  has  in  excess  of  200,000  records.  Matching 
each  entry  against  a  list  of  any  substantial  number  of  UICs  took  in  excess  of  an  hour.  The 
solution  was  to  add  an  aggregation  query,  shown  in  Table  7,  to  the  data  preprocessing 
phase  that  counted  the  entries  for  each  distinct  UIC  and  maintained  only  that  total.  After 
implementation  of  this  query  the  application  would  then  only  query  the  aggregate  table 
for  each  UIC  on  the  area  list  and  conduct  a  simple  summation  query.  This  form  of 
aggregation  was  implemented  in  the  extraction  process  for  four  of  the  17  databases: 
FINANCE,  FYxxLOSS,  G18CWE,  and  RPINFODT.  Those  queries  are  listed  in  detail  in 
Appendix  B.  This  process  moved  the  query  time  required  for  the  counting  process  into 

SELECT  UIC,  COUNT (UIC)  AS  UIC_TOTAL 

FROM  GlSNatl 

ORDER  BY  UIC 

GROUP  BY  UIC 

INTO  G18Natl_UIC 

Table  7.  Example  Aggregation  Query  -  G18Natl  UIC 
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the  data  preprocessing  session  and  out  of  each  individual  evaluation  session,  reducing  site 
evaluation  time  by  about  half. 

C.  DATA  ANOMALIES 

The  final  and  most  intricate  problem  area  was  the  quality  of  the  source  data.  As 
discussed  in  Chapter  III,  the  solution  a  DSS  provides  is  only  as  good  as  the  data  on  which 
it  is  based.  The  quality  of  the  source  data  files  for  the  ARIES  SDSS  project  provided  a 
substantial  challenge  to  the  development  team. 

Early  in  the  development  process,  the  frequency  of  data  values  that  were  missing 
or  null  caused  multiple  error  conditions  in  the  applications.  A  need  arose  to  identify 
missing  data  values  with  a  default  error  value  so  the  application  did  not  have  to  contend 
with  null  values.  A  value  of  “-999”  was  returned  for  a  decision  measure  as  the  default 
error  value.  Flagging  this  one  error  value  identified  many  inconsistencies  in  the  source 
data  files.  Additionally,  the  magnitude  of  the  number  of  default  error  values  that  the 
system  returned  became  a  concern  during  initial  testing  by  the  user.  The  initial  intent  of 
inserting  error  values  was  to  allow  the  user  an  opportunity  to  enter  a  subjective  value  in 
place  of  the  missing  value.  This  subjective  value  would  hypothetically  allow  the  decision 
modeler  to  provide  a  better  approximation  of  site  desirability.  Some  runs  of  the 
application  returned  so  many  default  error  values,  however,  that  the  user  would  have  been 
entering  more  values  for  decision  measures  than  the  automated  application  returned,  thus 
defeating  the  primary  purpose  of  the  system.  This  was  deemed  an  unacceptable 
condition. 


1.  Proxy  Value  Calculations 

It  was  decided  that  an  interim  solution  to  the  number  of  error  values  returned  was 
to  determine  a  proxy  or  default  value  for  each  measure.  These  proxy  values  could  be 
substituted  automatically  in  place  of  the  “-999”  values  to  allow  the  decision  model 
evaluation  to  be  conducted  without  sacrificing  authenticity  completely.  Determining  a 
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value  that  would  provide  an  accurate  representation  for  a  given  measure  required  the 
calculation  of  basic  descriptive  statistics  for  each  measure  (e.g.,  Mean,  Standard 
Deviation,  Minimum  and  Maximum  values). 

Descriptive  statistics  could  be  calculated  for  17  of  the  20  decision  measures  for 
each  facility  because  the  business  rule  did  not  depend  on  knowing  the  Moving  Unit. 
These  measures  were  facility  oriented  or  oriented  on  the  area  around  the  proposed  facility 
independent  of  any  characteristics  relating  directly  to  the  moving  unit.  The  three 
measures  that  could  not  be  calculated  in  this  way  because  they  do  depend  on  moving  unit 
characteristics  are:  Number  of  Reassignments  from  the  Moving  Unit,  Available 
individuals  with  MOSs  of  interest  from  Closing  Units,  and  Available  IRR  individuals 
with  MOSs  of  interest. 

In  an  attempt  to  determine  a  source  for  the  error  conditions,  as  well  as  calculate 
descriptive  statistics,  a  complete  evaluation  of  all  possible  sites  was  conducted.  This 
“global”  evaluation  process  allowed  the  application  to  be  tested  to  the  full  extent  of  the 
data  set  and  assisted  the  development  team  to  identify  potential  problems  quickly.  The 
procedure  was  conducted  twice  and  required  the  application  to  run  without  interruption  in 
excess  of  a  week  (using  a  Pentium  90MHz  personal  computer).  The  resulting  descriptive 
statistics  for  17  of  the  20  decision  measures  are  listed  in  Table  8.  Appendix  C  contains  a 
detailed  listing  of  the  descriptive  statistics  and  frequency  data  for  each  decision  measure. 

2.  Data  Validation 

Because  of  the  enormous  amount  of  missing  or  null  values  that  the  system  was 
returning  during  initial  evaluation  sessions,  it  became  necessary  to  verify  and  validate  the 
data  set  in  use  for  the  ARIES  SDSS  prototype.  This  was  required  to  localize  the  problem 
and  determine  if  the  problem  was  with  the  data  or  in  the  application  implementation. 

Before  the  data  set  could  be  validated,  an  appropriate  range  of  data  values  had  to 
be  determined  for  each  measure.  The  limits  of  the  ranges  were  determined  using  a  rule  of 
reasonableness  to  identify  values  that  would  adversely  affect  the  evaluation  process. 
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Measure 

Obs 

(N) 

Min 

Value 

Max 

Value 

Mean 

Std 

Dev 

1 .  Facility  Backlogged  Maint. 

1,205 

0 

11,979,371 

448,131 

837,391 

2.  Facility  Operating  Cost 

1,251 

0.0 

293.5 

3.1 

9.7 

3.  Facility  Age 

765 

0 

1,677 

295.1 

173.3 

4.  Facility  Condition 

1,251 

N/A 

N/A 

N/A 

N/A 

5.  Facility  Owned 

1,319 

N/A 

N/A 

N/A 

N/A 

6.  Competition 

1,300 

18 

20,759 

4,116.3 

3,960.0 

7.  Area  Drill  Attendance 

1,300 

0.20 

0.82 

0.58 

0.06 

8.  Area  Loss  Rate 

1,325 

0.00 

0.86 

0.32 

0.11 

9.  Area  Transfer  Rate 

1,300 

0.00 

1.84 

0.27 

0.20 

1 0.  Area  Average  Manning 

1,325 

0.00 

1.94 

0.86 

0.20 

1 1 .  Distance  to  Recruiter 

1,325 

0 

7,619.9 

18.2 

287.7 

12.  Area  Available  Closing  Unit 

819 

1 

504 

75.2 

114.1 

13.  IRR  Available 

1,315 

1 

3,497 

395.8 

r658.9 

14.  Area  Recruit  Market 

1,316 

253 

214,738 

33,189.9 

41,290.4 

16.  Distance  AMSA 

1,325 

0 

7,619.9 

42.4 

289.1 

17  Distance  ECS 

1,325 

0 

268.1 

510.1 

18.  Facility  Weekends  Used 

1,320 

0 

f 

1.6 

1.0 

Note:  Measures  15,  19,  and  20  are  dependent  on  the  Moving  Unit 
Total  Number  of  Facilities  (N):  1 325 


Table  8.  Descriptive  Statistics  for  Decision  Measures 

Consideration  was  given  to  the  following  areas;  (1)  the  range  of  values  returned  during 
the  evaluation  process,  (2)  expected  values  based  on  the  Yield  Curves,  and  (3)  common 
sense  (i.e.,  a  value  of  zero  for  Facility  Age  was  not  considered  reasonable).  The  valid 
ranges  for  each  measure  are  listed  on  Table  9. 

As  indicated  in  Table  9,  major  problems  with  at  least  six  of  the  20  decision 
measures  were  identified.  The  validation  was  conducted  on  the  list  of  1523  available 
facilities.  The  review  identified  a  serious  problem  with  files  which  contained  missing 
values;  in  some  cases  the  files  were  missing  as  much  as  57%  of  the  values  (Area  Drill 
Attendance).  A  detailed  review  of  the  data  files  in  question  determined  that  the  files  did 
not  contain  information  for  any  state  other  than  Pennsylvania.  This  problem  was  a  result 
of  the  user  requirements  from  a  prototype  for  Pennsylvania  data  to  a  national  data  set. 
This  change  occurred  during  the  development  process  without  the  data  files  being 
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Measure 

Missing 

Values 

(%) 

Out  of 
Range 
(%) 

Potentially 

Valid 

<%) 

Valid 

Range 

1 .  Facility  Backlogged  Maint. 

14.8 

2.5 

82.7 

liH— 

2.  Facility  Operating  Cost 

8.5 

24.4 

67.1 

3.  Facility  Age 

49.6 

0.1 

50.2 

x3>0  j 

4.  Facility  Condition 

8.5 

0.0 

91.5 

x4  =  G  or  A  or  R 

5.  Facility  Owned 

1.0 

0.0 

99.0 

x5  =  Y  or  N 

6.  Competition 

2.7 

0.0 

97.3 

0<x6<  21,000 

7.  Area  Drill  Attendance 

57.1 

8.4 

34.5 

0  <  x7  <  1 .0 

8.  Area  Loss  Rate 

2.4 

60.3 

37.2 

0  <  x8  <  1 .0 

9.  Area  Transfer  Rate 

2.7 

56.5 

40.8 

0  <x,  <  1.0 

10.  Area  Average  Manning 

0.0 

3.2 

96.8 

0<x10<  1.5 

11.  Distance  to  Recruiter 

0.6 

0.7 

99.3 

x„  >  500 

12.  Area  Available  Closing  Unit 

38.1 

0.0 

61.9 

x12  ^  0 

13.  IRR  Available 

1.3 

0.0 

98.7 

XB  >  0 

14.  Area  Recruit  Market 

1.2 

0.0 

98.8 

X,4>0 

16.  Distance  AMS  A 

0.0 

0.7 

99.3 

x16  >  500 

17  Distance  ECS 

0.0 

11.6 

88.4 

x17  >  500 

18.  Facility  Weekends  Used 

0.9 

0.0 

99.5 

A 

00 

X 

Note:  Measures  15, 19,  and  20  are  dependent  on  the  Moving  Unit. 
Total  Number  of  Facilities  (N):  1523 


Table  9.  ARIES  Measures  Analysis  Statistics  -  Run  #1 

updated  to  match  the  new  national  search  requirements.  Databases  containing  nationwide 
information  were  used  to  conduct  a  second  analysis  and  evaluation  session  as  discussed 
later  in  Table  10. 

A  problem  was  also  identified  when  using  a  ratio  as  the  measuring  metric.  A  ratio 
does  not  reflect  the  magnitude  of  the  underlying  values  used  to  obtain  that  ratio.  In  the 
case  of  the  value  for  Area  Drill  Attendance,  for  example,  the  application  returned  a  value 
of  0.8  for  a  facility,  which  would  be  considered  within  the  expected  range.  The  value  for 
Area  Drill  Attendance  involves  calculating  the  ratio  of  reservists  meeting  satisfactory 
drilling  requirements  to  the  total  number  of  reservists  required  to  drill  at  a  facility.  The 
application  was  constructed  to  store  the  interim  values  of  DRILL  SAT  and 
DRILL_TOTAL,  which  were  calculated  to  be  4  and  five  5  respectively,  resulting  in  a 
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ratio  of  0.8.  However,  these  numbers  did  not  match  the  total  number  of  reservists 
actually  assigned  which  was  2496.  This  wide  disparity  was  traced  back  to  the  same 
problem  leading  to  57%  of  the  facilities  having  missing  values  for  Area  Drill  Attendance. 
This  problem  was  resolved  when  the  FINANCE  file  was  updated  to  reflect  entries  on  a 
national  basis.  However,  it  is  still  possible  for  a  measure  represented  as  a  ratio  to  hide 
potential  data  problems.  A  good  strategy  in  this  case  is  to  display  the  basic  values  that 
comprise  the  ratio  in  addition  to  the  ratio  itself.  Five  of  the  decision  measures  in  the 
ARIES  SDSS  project  use  ratios:  Area  Drill  Attendance,  Area  Loss  Rate,  Area  Transfer 
Rate,  Average  Area  Manning,  and  Reassignments.  A  detailed  description  for  the 
calculation  process  for  each  measure  is  shown  in  Appendix  A. 

One  potentially  deceptive  measure  for  which  values  calculated  during  the  analysis 
may  provide  false  feedback  is  Area  Available  Closing  Units.  This  measure  is  used  to 
determine  the  number  of  available  reservists  from  units  that  are  scheduled  to  close  in  the 
area  of  a  proposed  site.  Because  the  number  of  closing  units  is  small  in  comparison  to 
the  number  of  active  units,  not  all  facilities  will  have  units  scheduled  for  closing  within 
50  miles.  However,  the  application  will  return  a  default  error  value  for  this  case  that 
should  not  be  considered  an  error.  In  this  case  a  value  of  zero  could  be  the  valid  answer. 
Because  it  is  difficult  to  distinguish  between  missing  data  or  the  fact  that  there  may  be  no 
closing  units,  the  application  currently  does  not  compensate  for  this  situation  and  the 
correction  is  left  to  the  user. 

A  second  analysis  session  was  conducted  on  a  list  of  facilities  known  to  be  active 
Army  Reserve  facilities  in  the  continental  United  States.  The  original  list  of  1523 
facilities  was  pared  down  by  a  total  of  198  sites  to  a  total  of  1325  by  removing  facilities 
in  remote  locations,  facilities  marked  as  not  existing,  and  facilities  marked  as  temporary. 
Appendix  C  contains  the  frequency  data  and  descriptive  statistics  for  each  measure  that 
was  produced  by  this  second  analysis  run.  Table  10  shows  the  validity  analysis  statistics 
for  each  measure. 
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Measure 

Missing 

Values 

(%) 

Out  of 
Range 
(%) 

Potentially 

Valid 

(%) 

Valid 

Range 

1 .  Facility  Backlogged  Maint. 

9.1 

1.7 

89.2 

0  <  x,  <  20M 

2.  Facility  Operating  Cost 

5.6 

18.3 

76.1 

0  <x2  <  100 

3.  Facility  Age 

42.2 

0.2 

57.6 

o 

A 

X 

4.  Facility  Condition 

5.6 

0.0 

94.4 

x4  =  G  or  A  or  R 

5.  Facility  Owned 

0.5 

0.0 

99.5 

x5  =  Y  or  N 

6.  Competition 

1.9 

0.0 

98.1 

0<x6<  21,000 

7.  Area  Drill  Attendance 

1.9 

0.0 

98.1 

0  <  x7  <  1.0 

8.  Area  Loss  Rate 

0.0 

2.1 

97.9 

0  <  xg  <  1 .0 

9.  Area  Transfer  Rate 

1.9 

1.1 

97.0 

O 

A 

# 

A 

i— * 

o 

10.  Area  Average  Manning 

0.0 

2.3 

97.7 

0<xlo<  1.5 

1 1 .  Distance  to  Recruiter 

0.6 

0.1 

99.9 

xn  >  500 

12.  Area  Available  Closing  Unit 

38.2 

0.0 

61.8 

xn>0 

13.  IRR  Available 

0.7 

0.0 

99.3 

O 

A 

x" 

14.  Area  Recruit  Market 

0.6 

0.0 

99.4 

o 

A 

X* 

1 6.  Distance  AMSA 

0.0 

0.1 

99.9 

x,g  >  500 

17  Distance  ECS 

0.0 

10.6 

89.4 

x17  >  500 

18.  Facility  Weekends  Used 

0.4 

0.0 

99.6 

A 

00 

x" 

Note:  Measures  15, 19,  and  20  are  dependent  on  the  Moving  Unit. 


Total  Number  of  Facilities  (N):  1325 


Table  10.  ARIES  Measures  Analysis  Statistics  -  Run  #2 

Five  of  the  measures  in  this  run  fall  below  a  90%  potentially  valid  level.  Three  of 
these  measures  (Facility  Backlogged  Maintenance,  Facility  Operating  Cost,  and  Facility 
Age)  are  facility  related  and  are  the  result  of  missing  data  or,  in  the  case  of  facility 
operating  cost,  values  of  zero. 

The  missing  values  for  Area  Available  Closing  Unit,  as  discussed  above,  are  a 
result  of  proposed  sites  not  having  any  closing  units  within  the  geographic  area.  In  this 
case  it  would  be  acceptable  for  a  site  to  return  a  null  value  for  this  measure.  The  values 
for  the  Distance  to  ECS  measure  are  a  result  of  the  Army  Reserve  having  only  30  ECS 
sites  nationwide.  As  a  result  some  sites  will  have  a  distance  of  greater  than  500  miles  to 
the  nearest  ECS  site.  This  “out  of  range”  distance  value  will  be  interpreted  by  the 
decision  model  and  assigned  the  minimum  utility  value  based  on  the  shape  of  the  yield 
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curve.  Hence,  these  invalid  range  problems  for  these  two  variables  will  not  adversely 
affect  the  decision  model. 

Missing  or  null  values  in  the  data  files  are  identified  in  the  application  by  a  default 
error  value  which  can  be  changed  easily  by  the  user  in  order  not  to  affect  adversely  the 
outcome  of  the  evaluation.  The  values  that  are  out  of  range,  on  the  other  hand,  can  have  a 
direct,  negative  impact  on  an  evaluation  if  they  are  not  identified  and  corrected.  Consider 
Facility  Operating  Cost  measure  as  an  example.  Of  the  18.3%  of  the  values  that  were 
found  to  be  out  of  range,  all  but  two  were  equal  to  zero.  This  is  a  concern  in  the 
evaluation  process  because  a  value  of  zero  will  receive  the  maximum  utility  during  the 
decision  analysis  phase.  It  is  true  that  a  closed  facility  or  a  proposed  facility  would  not 
have  a  current  value  for  the  operating  costs  but  in  the  case  of  an  active  facility  this  value 
should  be  something  greater  than  zero.  Currently  this  situation  requires  the 
knowledgeable  user  to  intervene  and  apply  a  reasonable  value.  This  is  a  particularly 
insidious  example  of  how  incomplete  or  inaccurate  values  in  source  files  can  filter 
through  the  data  warehouse  into  the  DSS. 

3.  Data  Quality  Analysis 

The  data  analysis  conducted  for  the  SDSS  application  was  only  concerned  with 
determining  data  validity.  A  complete  analysis  would  consider  all  the  characteristics  of 
data  quality  discussed  in  Chapter  III  including  accuracy,  completeness,  consistency, 
timeliness,  and  uniqueness  as  well  as  validity.  Below  we  present  some  examples  of  data 
problems  encountered  during  the  development  phase,  which  relate  to  these  other  data 
quality  characteristics. 

Accuracy 

Problem:  (Facility  Condition  -  Measure  #  4) 

Every  value  for  facility  condition  is  “GREEN”.  It  is  unreasonable 
to  expect  that  no  facilities  would  be  coded  either  AMBER  or  RED, 
therefore  it  seems  quite  unlikely  that  these  values  match  the  actual 
condition  of  each  facility. 
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Solution: 


These  values  must  be  updated  by  the  owner/stakeholder  of  the 
operational  data  file. 


Completeness 

Problem: 

(Facility  Age  -  Measure  #3) 

The  source  data  file  is  missing  42%  of  the  facilities  deemed  to  be 
valid  from  the  GEOREF  file. 

Solution: 

These  values  must  be  updated  by  the  owner/stakeholder  of  the 
operational  data  file. 

Problem: 

GEOREF  File 

Zip  codes  are  missing  on  96  of  the  facilities  listed  in  GEOREF,  20 
of  which  are  marked  as  valid  facilities. 

Solution: 

These  values  must  be  updated  by  the  owner/stakeholder  of  the 
operational  data  file.  The  zip  code  is  used  to  geocode  these 
facilities  for  possible  selection  as  a  proposed  site. 

Consistency 

Problem: 

Zip  Codes  are  of  varying  length  in  all  the  data  files.  Some  files 
contain  the  nine  digit  zip  codes  while  others  only  maintain  five 
digit  codes. 

Solution: 

In  order  to  query  on  zip  codes  and  obtain  an  exact  match,  the  use 
of  five  digit  zip  codes  was  adopted.  This  was  done  during  the 
extraction  process  by  only  loading  the  first  five  digits  of  a  zip  code 
from  all  the  files. 

Problem: 

UICs  are  not  represented  uniformly  in  all  the  source  data  files. 
Some  files  use  a  UIC  designation  that  does  not  include  the  letter 
designating  an  active  unit.  Other  data  files  use  a  parent  UIC 
instead  of  the  UIC  of  an  actual  unit,  (e.g.,  unit  structure  is  against 

the  parent  UIC, . AA,  whereas  the  person  assigned  to  a  billet  is 

assigned  at  the  platoon  level  UIC, . Al) 

Solution: 

Ensured  all  UICs  were  six  digits  in  length. 

Problem: 

Data  entries  for  facilities  and  units  in  source  files  do  not  match  the 
list  of  valid  units. 

Solution: 

Not  corrected.  Action  required  by  the  owners  of  the  source  data 
files. 
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Timeliness 

Problem:  (Area  Loss  Rate,  Area  Transfer  Rate,  Measures  #  8  &  9) 

FYxxLOSS  File  used  to  determine  loss  and  transfer  rate  can  be  as 
much  as  twelve  months  out  of  date. 

Solution:  No  current  solution. 


Uniqueness 

Problem:  The  data  files  do  not  have  unique  indexes.  The  lack  of  a  unique 

list  of  facilities  and  units  has  allowed  entries  in  source  data  files  for 
sites  that  do  not  exist. 

Solution:  No  current  solution. 

These  examples  are  not  exhaustive  and  are  only  intended  to  be  representative  of  the 
problems  that  exist  in  the  source  data  files  for  the  ARIES  SDSS  project.  Further  analysis 
is  required  in  the  area  of  the  other  five  data  quality  characteristics  to  determine  the  overall 
level  of  data  quality  that  is  present  in  the  final  application.  Because  the  SDSS  is  based  on 
data  files  that  are  updated  on  a  frequent  basis,  there  is  a  need  to  monitor  the  quality  level 
of  the  data  following  each  future  extraction. 


D.  CHAPTER  SUMMARY 

As  with  any  prototype  DSS  development  project,  substantial  problem  areas  arose 
with  respect  to  data  quality  issues.  Three  major  categories  of  data  quality  problems  were 
identified:  business  rule  development,  unacceptable  query  performance,  and  data 
anomalies.  Though  these  problems  provided  a  considerable  challenge  to  the  development 
team,  acceptable  solutions  have  been  implemented  in  most  cases. 

The  documentation  of  the  business  rules  provides  the  building  blocks  of  the  SDSS 
application.  In  the  ARIES  application,  problems  with  the  business  rules  manifested  as 
logic  errors  or  lack  of  data  support.  Solutions  to  these  problems  can  be  developed  for 
both  types  of  business  rule  errors.  Unfortunately,  these  errors  usually  show  up  during  the 
initial  phases  of  the  application  development  and  must  be  resolved  to  allow  the 
application  to  continue  with  development. 
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Because  the  SDSS  application  is  query  intensive,  the  performance  level  of  the 
application  will  depend  on  the  ability  to  carry  out  each  query  in  the  most  efficient 
manner.  The  ARIES  application  required  performance  enhancements  in  two  areas:  geo¬ 
queries  for  spatially  related  queries  and  preprocessed  data  aggregation.  These  relatively 
simple  solutions  provided  a  performance  enhancement  that  reduced  the  evaluation  time 
for  each  site  by  an  order  of  twenty  fold,  from  hours  to  a  matter  of  minutes. 

The  validity  and  quality  of  the  source  data  directly  affects  the  quality  of  the 
resulting  evaluation  in  the  ARIES  SDSS  application.  An  understanding  of  the  data 
problems  involved  in  the  ARIES  project  did  not  become  a  concern  until  the  application 
was  in  full  development.  Because  of  the  magnitude  of  the  problems,  steps  were  initially 
taken  to  localize  the  source  of  problems.  Through  a  detailed  validation  analysis,  a  large 
portion  of  the  data  anomalies  was  corrected.  Data  problems  were  handled  either  at  the 
preprocessing  stage  with  filtering  queries  or  else  corrected  at  the  source  file.  The 
correction  of  these  anomalies  on  a  post  facto  basis  consumed  a  major  amount  of  time  that 
detracted  from  development.  This  opportunity  cost  of  undertaking  remediating  action 
could  be  reduced  significantly  with  prior  planning. 

The  problems  that  the  ARIES  SDSS  prototype  application  encountered  were  in 
most  cases  not  anticipated.  The  need  to  reduce  the  impact  of  these  problem  areas  on  the 
application  development  process  requires  changes  in  the  initial  steps  of  the  SDSS 
development  process.  Chapter  V  discusses  the  lessons  learned  from  this  initial  prototype 
development  and  proposes  several  requirements  that  should  be  added  to  the  development 
process  in  order  to  identify  solutions  for  these  problem  areas  earlier  in  the  project  life 
cycle. 
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V.  LESSONS  LEARNED:  SDSS  DESIGN  AND  DEVELOPMENT 


The  ARIES  SDSS  project  prototype  is  in  final  implementation,  but  it  did  not  get 
to  that  point  without  encountering  a  number  of  major  problems  that  could  have  been 
avoided  with  prior  knowledge.  The  lessons  learned  from  this  project  will  be  valuable  to 
the  development  of  second-generation  SDSS  projects. 

The  majority  of  the  problems  with  the  ARIES  project  centered  on  the  wide 
disparity  between  the  initial  system  concept  and  the  final  product.  The  original  goal  of 
the  project  was  to  develop  a  decision  model  for  the  TPU  readiness  issue.  However,  the 
final  product  is  a  fully  functional  automated  decision  support  tool.  As  the  ARIES  project 
grew  with  each  new  functionality  the  development  process  itself  received  less  and  less 
attention.  Writing  and  testing  code  became  the  focus.  While  this  is  a  common  scenario 
for  prototype  applications,  serious  problems  can  arise  when  a  decision  is  made  to  take  the 
prototype  to  full  implementation  with  little  planning,  which  is  what  occurred  in  this 
situation. 

This  chapter  discusses  recommended  changes  to  the  SDSS  development  process 
to  avoid  the  pitfalls  that  hampered  the  development  of  the  original  ARIES  prototype 
application.  These  changes  include  the  addition  of  a  data  migration  plan,  refinements  to 
the  decision  model,  data  model,  and  system  design,  and  future  considerations. 

A.  SDSS  DEVELOPMENT  PROCESS 

The  ARIES  SDSS  began  as  a  project  to  elicit  an  expert  system  decision  model  for 
placement  of  Army  Reserve  TPU  units  using  a  GIS  to  display  locational  decision  factors. 
As  the  users  generated  additional  requirements,  the  project  evolved  into  an  integrated 
application  data  resident  model  involving  the  use  of  a  GIS  application  and  a  DSS 
application.  Because  the  original  scope  of  the  project  centered  on  the  accuracy  of  the 
decision  model,  the  retrieval  of  data  and  data  quality  issues  supporting  that  model  were 
not  in  the  original  considerations. 
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This  complacency  about  the  underlying  data  mirrors  a  recurring  theme  in  the  data 
warehouse  development  literature.  Joe  Celko  and  Jackie  McDonald  indicate  the  likely 
consequences  of  such  oversights. 

“Ignore  or  trivialize  problems  with  the  existing  data  at  the  start  of  the 
project,  and  that  oversight  will  brutally  assert  itself  when  data  problems 
begin  to  surface  as  you  populate  the  warehouse  from  outside  data  sources, 
current  applications,  and  legacy  data.”[Ref.  1 9:p.  1] 

Unfortunately,  this  statement  was  particularly  appropriate  in  the  case  of  the  ARIES 

prototype  application.  Overlooking  the  migration  of  the  source  data  to  the  application  was 

a  major  oversight.  This  error  has  led  us  to  recommend  a  revision  of  the  SDSS 

development  process  that  is  documented  in  Chapter  II  (Figure  8).  The  major  difference 

from  the  original  process  is  the  addition  of  a  data  migration  plan  (DMP)  discussed  below. 


★ 


Identifies  recommended  changes  to  the  original  development  process. 

Figure  8.  Recommended  SDSS  Development  Process 
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B.  DATA  MIGRATION  PLAN 


As  the  ARIES  project  developed,  the  data  model  evolved  from  merely  querying 
source  files  into  a  specialized  form  of  a  data  warehouse.  The  importance  of  this  transition 
was  not  well  identified  or  understood  by  the  ARIES  development  team  at  the  time. 
Because  the  underlying  data  have  a  direct  effect  on  each  phase  of  the  development 
process  (decision  model,  data  model,  and  system  design),  the  data  resource  became  a 
critical  factor  to  the  success  and  completion  of  the  project. 

A  well  thought  out  quality  migration  plan  for  source  data  can  ease  the  labor  of 
application  development.  [Ref.  20:p.  1]  As  a  crucial  piece  that  integrates  the  application, 
the  migration  plan  is  responsible  for  transforming,  transporting,  and  scrubbing  the  data. 
This  requires  substantial  prior  planning. 

1.  Designate  Migration  Team 

The  most  important  action  to  be  taken  in  developing  a  data  migration  plan  is  to 
identify  the  group  of  individuals  that  will  be  responsible  and  take  ownership  for  the  data 
portion  of  the  project.  A  team  of  individuals  should  be  named  early  in  the  project 
development  that  will  be  responsible  for  making  the  required  data  available  to  the 
application.  These  team  members  should  consist  of  both  business  and  technical 
individuals  from  the  application  agency. 

Members  of  the  migration  team  will  be  responsible  for  gathering  source  files, 
analyzing  those  files,  and  maintaining  the  meta-data/business  information  about  each  file 
and  its  elements.  This  information  about  the  source  files  is  important  to  the  process  of 
mapping  data  elements  to  the  individual  decision  measures.  The  understanding  that  can 
be  gained  by  documenting  the  business  knowledge  underlying  each  data  element  and 
each  data  file  will  allow  the  development  team  to  implement  the  business  rules  with  a 
minimum  of  disruption.  As  each  source  file  is  analyzed  and  identified  for  use  in  the 
application,  a  process  should  be  initiated  for  maintaining  the  business  information  about 
each  file.  This  should  include  at  a  minimum  the  information  that  can  be  documented  on 
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the  Source  File  Documentation  Form  in  Appendix  D.  The  process  of  documenting  the 
source  files  should  be  the  responsibility  of  the  business  members  of  the  migration  team 
who  are  intimate  with  the  source  files. 

The  migration  team  will  be  responsible  for  the  structure  and  architecture  of  the 
centralized  data  resource  file  that  is  based  on  the  intended  use  of  the  data  in  the 
application  and  the  structure  of  the  source  files.  A  migration  strategy  will  be  developed 
that  includes  determining  a  method  of  migration  to  maximize  the  information  available  to 
the  decision  support  tool. 

A  migration  strategy  is  achieved  by  determining  what  type  of  information  will  be 
maintained  and  at  what  level  of  detail.  The  term  granularity  is  used  in  data  warehousing 
to  identify  the  level  of  detail  that  is  contained  in  the  data  warehouse.  The  granularity  is 
determined  by  either  maintaining  detailed  data  elements  as  they  are  found  in  the  source 
files  or  by  summarizing  those  elements  to  reduce  the  granularity.  As  discussed  in  earlier 
chapters  summarizing  or  aggregating  data  in  the  data  resource  file  can  improve 
dramatically  the  performance  of  certain  queries.  Determining  this  strategy  early  in  the 
project  life  cycle  provides  the  system  design  team  the  advantage  of  a  stable  data  source 
for  a  most  of  the  overall  project. 

Once  the  design  of  the  decision  model  is  complete,  the  migration  team  can  begin 
to  assist  in  the  development  of  the  data  model  by  assisting  in  the  mapping  of  data 
elements  to  decision  factors  and  in  developing  the  business  rules.  The  logic  for  the 
eventual  data  extraction  process  the  team  will  use  comes  directly  from  these  business 
rules. 


2.  Determine  Extraction  Logic  and  Generate  Extraction  Routine 

By  being  involved  in  the  mapping  of  the  decision  measures  to  actual  data 
elements,  the  migration  team  can  begin  the  process  of  designing  the  logic  comprising  the 
data  extraction  process.  The  migration  team  will  be  required  to  identify  the  data  elements 
required  in  order  to  reduce  the  amount  of  data  that  is  migrated  to  the  data  resource  file. 
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The  logic  involved  in  the  extraction  process  can  implement  criteria  that  are  in  effect  for 
any  of  the  decision  measures.  An  example  of  this  logic  would  be  filtering  only  the  active 
reservists  from  the  transaction  files  by  identifying  a  specific  code  that  is  used  to  mark 
each  active  reservist. 

The  business  rules  developed  in  the  data  model  can  be  used  to  identify  common 
elements  of  aggregation  and  structure  that  are  required  in  the  data  resource  file. 
Extraction  routines  or  queries  can  be  developed  to  retrieve,  transform,  and  summarize 
data  into  a  format  that  will  optimize  the  usability  of  the  data  resource  file  by  the 
application.  The  extraction  routines  can  also  be  used  to  determine  and  set  the  indexed 
fields  of  the  data  resource  that  allow  the  application’s  querying  tool  to  access  the  stored 
data  in  an  efficient  manner. 

Data  scrubbing  and  cleansing  for  common  data  errors  should  also  be  implemented 
in  the  extraction  routines.  The  data  cleansing  and  conditioning  rules  are  developed  from 
the  data  quality  assessments  discussed  in  the  next  section. 

3.  Quality  Assured  Data 

As  discussed  in  Chapters  III  and  IV,  data  quality  is  an  important  issue  that  must 
be  investigated  very  early  in  a  project.  The  migration  team  will  be  directly  involved  in 
assessing  the  quality  level  of  data  available  as  well  as  that  required  by  the  proposed 
application.  This  portion  of  the  migration  process  can  do  more  to  ensure  the  success  of 
the  project  than  any  other.  The  first  and  most  important  task  that  must  be  undertaken 
pertaining  to  data  quality  is  the  Data  Quality  Baseline  Assessment.  Which  will  be  used  to 
identify  problems  with  accuracy  and  inconsistencies  of  the  source  data  that  can  be 
integrated  into  the  data  preprocessing  routines.  The  baseline  will  also  provide  the 
customer  and  the  development  team  with  a  deeper  understanding  of  the  data  that  are 
intended  for  use  in  the  project. 

In  order  for  the  migration  team  to  be  able  to  perform  an  analysis,  they  must  first 
develop  a  set  of  quality  metrics.  This  metrics  should  include,  at  a  minimum,  an 


61 


evaluation  of  the  source  files  and  their  ability  to  meet  the  six  characteristics  of  data 
quality  outlined  in  Chapter  III.  By  completing  this  initial  assessment,  the  migration  team 
will  be  able  to  identify  potential  problem  areas  and  make  a  determination  about  the 
legitimacy  of  the  data  to  support  the  decision  process. 

In  conjunction  with  the  customer,  the  migration  team  must  determine  and 
document  the  expectations  for  the  quality  of  data.  This  expectation  level  should  be  based 
on  the  level  of  risk  the  customer  is  willing  to  accept  if  the  resulting  decision  is  wrong 
because  of  the  underlying  data.  Meta-data  and  data  quality  metrics  will  be  used  to 
document  these  expectations  and  will  be  used  on  a  continued  basis  to  measure  the 
performance  of  the  data  as  it  pertains  to  the  expected  level  of  data  quality. 

The  data  errors  that  result  from  comparing  the  source  data  with  the  data  quality 
metrics  will  produce  additional  logic  to  be  included  in  the  extraction  phase.  This  process, 
known  as  data  conditioning  or  data  cleansing  will  filter  out  potential  data  quality 
problems  from  the  source  files  and  thereby  improve  the  data  quality  in  the  data  resource 
file.  This  process  will  not  necessarily  correct  the  source  of  those  problems,  as  quality 
problems  can  only  be  fully  corrected  in  the  source  files.  The  migration  team  is 
responsible  for  identifying  and  correcting  known  data  quality  problems  at  their  source 
wherever  possible. 

C.  DECISION  MODEL  DEVELOPMENT 

A  decision  model  is  the  foundation  upon  which  the  entire  SDSS  application  will 
be  built.  In  the  SDSS  concept,  a  known  decision  process  is  automated  by  accessing 
available  database  information.  The  reliability  and  trust  placed  on  the  outcome  of  the 
SDSS  must  be  based  on  the  quality  of  the  underlying  database  information.  Ken  Orr 
states  in  his  discussion  of  Data  Quality  and  Systems  Theory  that  the  quality  of  data  is  a 
function  of  its  use.  [Ref.  14:p.  9]  In  the  ARIES  SDSS  project  many  of  the  problems 
encountered  during  the  development  process  were  a  result  of  the  fact  the  data  had  not 
been  used  for  the  purpose  of  making  site  location  decisions.  Future  applications  should 
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attempt  to  leverage  the  data  that  is  being  used  in  current  business  processes  to  maximize 
the  inherent  quality  factors  arising  from  frequent  use. 

During  the  development  of  a  decision  model  for  a  new  application,  care  should  be 
taken  to  document  the  associated  data  that  is  currently  being  used  for  decision-making. 
Identifying  these  sources  will  assist  the  migration  team  in  their  efforts  to  gather  and 
present  source  data  of  the  highest  quality.  In  the  case  of  the  ARIES  SDSS  application, 
the  source  files  were  being  used  by  a  wide  community  of  different  users  at  US  ARC 
primarily  for  operational  purposes  as  opposed  to  decision  support  applications.  As  a 
result,  the  data  files  were  never  obliged  to  meet  the  stringent  quality  standards  required  of 
DSS  applications. 

D.  DATA  MODEL 

Problems  with  the  underlying  data  will  continue  to  be  a  problem  in  any 
application  that  attempts  to  leverage  information  stored  in  legacy  databases.  The  goal  in 
the  future  will  be  to  minimize  the  effect  these  problems  have  on  the  development  process. 
Listed  below  are  three  areas  that  can  smooth  the  transition  of  data  into  the  SDSS 
application  and  limit  the  impact  on  the  application  process. 

1.  Data  Standardization 

The  elements  of  source  data  must  be  standardized  when  the  legacy  data  files  being 
used  in  an  application  are  not  constructed  under  the  same  set  of  business  specifications. 
These  inconsistencies  in  rules  and  definitions  may  lead  to  problems  when  the  actual  data 
are  being  interpreted  out  of  the  original  context  in  which  it  was  defined.  Data 
standardization  is  achieved  by  logically  identifying,  grouping,  and  classifying  data. 

This  lack  of  standards  for  the  data  attributes  in  existing  applications  can  manifest 
in  many  ways.  For  example,  data  elements  in  different  files  may  not  label  the  same  field 
in  the  same  manner.  Fields  with  the  same  name  may  not  contain  the  same  information 
because  of  differences  in  usage  by  different  customers.  Examples  from  the  ARIES  SDSS 
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project  are  the  data  fields  that  represent  the  Unit  Identification  Codes  which  were  labeled 
differently  in  all  of  the  following  manners:  UIC1,  UIC,  CURRJJIC,  OWNJJIC. 
Furthermore,  the  UIC  field  was  used  to  identify  the  same  billet  in  different  ways.  The 
source  file  that  lists  the  actual  billets  or  jobs  at  a  reserve  facility  was  marked  with  the  use 
of  a  parent  UIC  that  identified  the  facility  itself.  The  individuals  actually  assigned  to  a 
billet  in  the  personnel  file,  on  the  other  hand,  are  listed  with  a  UIC  that  identifies  the 
actual  platoon  to  which  the  individual  is  assigned.  Such  inconsistencies  make  it  difficult 
to  verify  one  file  against  another.  In  this  case  it  would  be  impossible  to  identify  if  there 
were  a  specific  individual  assigned  for  every  billet  in  the  billet  structure  file  or  whether 
the  billet  to  which  an  individual  is  assigned  is  valid. 

The  purpose  of  data  standards  is  to  facilitate  common  use  and  understanding  in 
identifying  data  characteristics.  All  parties  involved  in  the  project  must  be  able  to 
interpret  the  same  information  in  exactly  the  same  way.  This  will  allow  the  development 
to  be  consistent  and  remove  the  need  for  each  individual  to  have  a  deep  understanding  of 
each  data  file. 

Data  standardization  must  be  conducted  during  the  extraction  process.  The  rules 
associated  with  the  standardization  specifications  will  be  implemented  in  the  extraction 
routines  developed  by  the  migration  team.  These  specifications  become  part  of  the 
transformation  process  and  the  application  can  then  be  developed  without  concern  about 
knowledge  of  the  individual  structure  of  source  files. 

2.  Meta-data  Documentation  Process 

Understanding  the  information  about  the  source  databases,  i.e.,  the  meta-data,  will 
allow  the  application  to  maximize  the  use  of  information  in  the  data  files.  There  were 
many  times  when  development  of  the  application  stalled  while  the  development  team 
waited  for  insight  about  one  or  more  source  data  files  to  be  provided  by  the  customer. 
Meta-data  has  two  parts;  (1)  the  detailed  information  about  the  data  elements,  their 
formats,  length  and  so  on,  and  (2)  the  business  information  and  understanding  about  the 
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data  file.  The  business  information  documents  the  rules  involved  in  populating  the  data 
file,  what  the  data  file  represents,  and  the  criteria  for  each  data  element. 

An  important  part  of  this  documentation  process  is  identifying  the  ownership  of  a 
source  file  and  maintaining  a  knowledgeable  point  of  contact  for  each  file.  These 
individuals  will  be  responsible  for  documenting  and  maintaining  the  meta-data 
throughout  the  lifecycle  of  the  project. 

Michael  Brackett  describes  the  need  for  a  meta-data  warehouse  that  goes  beyond 
the  traditional  data  information  storage  repositories  and  provides  a  “personal  help  desk” 
for  increasing  the  awareness  and  understanding  of  the  data  resource. [Ref.  3:p.  193]  This 
concept  would  allow  the  user  to  access  indexed  information  about  the  source  file  and 
therefore  maximize  his/her  ability  to  identify  what  data  are  available.  Using  meta-data  to 
the  fullest  extent  possible  would  benefit  the  SDSS  concept  by  allowing  the  decision¬ 
maker  to  tap  the  maximum  amount  of  knowledge  available  in  the  source  data  files.  The 
intent  here  is  not  create  a  meta-data  documentation  project  but  to  provide  the  maximum 
amount  of  available  information  concerning  the  data  files  to  the  development  team.  This 
tedious  and  time  consuming  project  will  pay  dividends  in  the  quality  of  the  output  from 
the  application. 

3.  Identify  Spatial  Aspects  of  Queries 

“Over  80%  of  business  data  have  some  spatial  context  such  as  a  customer  address, 
ZIP  Code,  or  location.”  [Ref.  5:p.  1]  Taking  advantage  of  the  spatial  aspects  of  the 
underlying  data  can  provide  valuable  information  to  the  decision  process  as  well  as 
enhance  the  performance  of  the  final  product.  Identifying  the  decision  measures  and  their 
associated  business  rules  that  rely  on  a  spatial  aspect  can  be  used  to  create  a  performance 
advantage. 

During  the  ARIES  SDSS  project,  complex  queries  that  involved  determining  if 
one  entry  in  a  table  existed  in  another  table  were  found  to  have  definite  spatial  aspects. 
The  ARIES  application  realized  a  twenty-fold  increase  in  query  performance  by  simply 
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allowing  the  GIS  application  to  conduct  that  portion  of  the  query  related  to  spatial 
parameters. 

Using  the  advantages  that  a  GIS  system  provides  to  localize  data  will  allow  the 
SDSS  application  to  access  larger  quantities  of  data  in  a  shorter  amount  of  time.  The 
ability  of  a  SDSS  application  to  access  large  quantities  of  data  efficiently  will  allow 
incremental  improvement  of  the  underlying  decision  process.  This  spatial  component 
may  also  allow  the  decision-maker  to  introduce  new  decision  measures  that  can  enhance 
the  final  outcome  of  the  SDSS. 

E.  SYSTEM  DESIGN 

Lessons  learned  from  the  system  design  portion  of  the  ARIES  project  are 
discussed  in  a  thesis  being  prepared  concurrently  by  LT  Peter  Falk.  As  the  principal 
designer  of  the  UI  application,  his  thesis  provides  a  detailed  discussion  of  the  issues  and 
challenges  required  to  complete  the  ARIES  SDSS  prototype  application. 

F.  FUTURE  CONSIDERATIONS 

The  concept  of  an  SDSS  application  is  still  evolving.  The  ARIES  prototype 
application  has  proved  the  viability  of  an  asset  that  integrates  a  GIS  system  and  DSS  tools 
to  leverage  the  knowledge  maintained  in  legacy  databases  for  decision-making  purposes. 
Enhancements  to  be  incorporated  in  methodologies  used  in  future  SDSS  applications  can 
be  separated  into  the  phases  of  the  development  process;  Decision  Model,  Data 
Migration,  Data  Model,  System  Design,  and  Testing. 

1.  Decision  Model 

Discussions  of  improvements  to  the  decision  model  have  been  discussed  fully  in 
Reference  1  and  Reference  2. 
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2. 


Data  Migration 


The  migration  process  implemented  in  the  ARIES  prototype  only  supports  the 
application  in  its  current  configuration.  The  migration  application,  Aries  Administrator, 
does  not  allow  for  additional  data  files  or  queries.  This  limitation  will  hinder  the  ability 
of  the  Administrator  to  support  the  ARIES  application  if  any  portion  of  the  decision 
hierarchy  is  changed.  Consideration  should  be  given  to  adapting  the  Administrator 
application  to  allow  the  ability  to  add  and  remove  files  and  queries  from  the  system.  The 
Administrator  currently  only  allows  for  a  complete  extraction  of  every  data  file  associated 
with  the  application,  a  time  consuming  event  that  is  not  necessary  if  every  data  file  has 
not  changed.  A  situation  may  arise  where  only  one  data  file  has  changed;  in  this  case,  the 
Administrator  application  should  be  adapted  to  allow  the  user  to  conduct  an  intelligent 
update  by  choosing  the  data  files  that  require  updating.  By  documenting  the  update 
frequency  of  a  data  file  in  the  meta-data  (i.e.,  weekly,  monthly,  etc.),  the  Administrator 
could  identify  data  files  that  have  not  been  updated. 

Based  on  the  importance  of  the  quality  of  data  migrated  to  the  data  resource  file,  it 
will  be  important  for  future  implementations  of  the  SDSS  methodology  to  have  an 
automated  method  for  determining  and  maintaining  data  quality.  The  migration  process 
is  the  phase  in  which  the  rules  associated  with  the  quality  metrics  can  be  used  to 
determine  the  quality  of  the  underlying  process.  By  automating  this  process,  exceptions 
can  be  generated  during  the  migration  process  that  will  identify  known  problems.  The 
migration  engine  would  be  able  to  provide  an  estimation  of  the  quality  of  data  and 
determine  whether  that  level  is  acceptable  based  on  the  expectations  provided  by  the  user. 

3.  Data  Model 

Because  the  idea  of  a  useful  decision  support  tool  involves  the  ability  to  be 
flexible  as  the  decision  process  changes,  an  automated  application  such  as  ARIES  must 
be  able  to  adapt  to  that  changing  environment.  There  should  be  a  system  in  place  that 
will  identify  changes  to  any  part  of  the  application,  (i.e.,  legacy  data,  decision  model,  data 


67 


quality  requirements,  etc.)  and  capture  the  effects  of  those  changes  on  the  application. 
For  example,  if  a  legacy  database  changes  in  any  way  the  migration  process  should  be 
adjusted  to  reflect  those  changes  as  well.  Also,  if  a  new  decision  measure  is  added,  the 
data  and  meta-data  to  support  that  measure  should  be  added  to  the  data  resource  file. 
Flexibility  of  the  application  will  be  the  key  to  its  longevity.  If  the  application  cannot  be 
updated  easily  as  the  business  process  or  environment  changes  it  will  die  a  certain  and 
swift  death. 

Data  values  that  do  not  change  over  long  periods  of  time  and  are  used  to  support 
decision  measures  should  be  calculated  only  as  those  values  change  and  not  during  each 
evaluation  session.  Values  are  calculated  during  each  session  for  measures  such  as 
Distance  to  Recruit  Station  and  Distance  to  ECS  that  do  not  change  on  a  frequent  basis. 
This  calculation  process  could  be  moved  from  the  evaluation  process  into  the  data 
migration  process  by  computing  a  value  that  is  pre-calculated  for  all  possible  sites.  Early 
identification  of  values  that  change  infrequently  will  reduce  the  overhead  required  in  the 
system  design  portion  of  the  project. 

4.  System  Design 

The  ARIES  SDSS  prototype  application  instituted  the  use  of  an  error  value  to  help 
identify  data  that  were  missing  or  returned  null  (i.e.,  -999).  This  was  effective  for 
identifying  a  number  of  potential  problems  and  provided  valuable,  albeit  limited, 
feedback  from  the  system  directly  to  the  user.  However,  this  concept  must  be  expanded 
to  include  other  error  codes  for  a  more  detailed  feedback  system.  The  error  codes  should 
kept  to  a  minimal  list  of  highly  useful  codes.  For  example,  other  error  codes  should  be 
used  to  identify  a  value  of  zero  for  a  measure  that  should  not  be  zero  (i.e..  Facility  Age). 
Another  example  would  be  additional  error  values  that  can  signify  different  types  of 
problems  resulting  from  the  calculations,  e.g.,  if  Number  Assigned  returned  for  a  unit  is 
null  or  zero  and  a  value  is  found  for  Losses  or  Transfers  for  the  same  unit,  then  an  error 
exists  in  one  of  the  files.  Currently  the  ARIES  application  assigns  a  zero  to  the  value  of 
the  calculation  to  avoid  a  division  by  zero.  Adding  an  error  value  to  identify 
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inconsistencies  between  data  files  would  provide  the  user  with  a  possible  reason  for  the 
values  of  Loss  Rate  or  Transfer  Rate  being  zero.  This  type  of  error  detection  would 
require  the  application  to  have  intelligent  business  rules  that  document  the  relationships 
between  decision  measures.  This  same  process  could  also  be  used  to  identify  possible 
problems  with  ratio  values  discussed  in  Chapter  IV. 

Other  considerations  were  made  to  allow  the  user  the  ability  to  choose  a  default 
value  for  measures  that  returned  with  error  values.  The  user  could  choose  to  use  a 
previously  determined  value  such  as  the  mean.  This  would  allow  the  decision  model  to 
include  this  measure  in  the  evaluation  of  the  site. 

5.  Testing 

The  implementation  and  testing  phase  will  continuously  monitor  and  evaluate  the 
results  of  the  system  and  provide  feedback  to  the  application.  The  application  must  have 
the  ability  to  monitor  and  evaluate  the  results  and  provide  feedback  to  the  system  to 
improve  the  process.  During  the  ARIES  project,  a  problem  like  this  was  identified  in  the 
post  implementation  phase.  Every  value  the  system  returned  for  Facility  Condition, 
Measure  #4,  was  the  same  value,  GREEN.  This  resulted  in  each  site  receiving  the 
maximum  utility  for  that  measure  and  effectively  nullifying  any  benefits  that  measure 
provided  to  the  decision  model.  A  problem  like  this  could  be  identified  by  observing  the 
trends  of  the  answers  and  having  the  testing  module  send  a  flag  to  the  user  that  reports  a 
problem  with  a  decision  measure.  The  application  would  provide  some  basic  information 
to  assist  in  diagnosing  the  problem.  A  full  testing  of  the  application  requires  a  formal 
feedback  mechanism  that  will  allow  problems  to  be  documented  as  well  as  the  solutions 
and  corrections  to  be  documented. 

G.  CHAPTER  SUMMARY 

An  inspection  of  the  final  ARIES  SDSS  prototype  application  can  provide  future 
implementations  of  the  SDSS  methodology  with  valuable  information.  The  oversights, 
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problems,  and  mistakes  discovered  during  the  design  and  development  of  this  “proof  of 
concept  application  has  led  to  recommended  changes  to  the  development  process. 
Among  the  changes  are  the  addition  of  a  DMP  and  minor  refinements  to  the  decision 
model,  data  model,  and  system  design  phases. 

The  need  for  a  DMP  was  unfortunately  realized  too  late  in  the  ARIES  project. 
Many  of  the  stumbling  blocks  in  the  development  could  have  been  avoided  had  there 
been  an  integrated  plan  for  the  movement  of  data  from  its  source  to  the  DSS  application. 
A  detailed  DMP  will  involve  assigning  responsible  individuals  to  gather  data,  develop 
extraction  logic,  and  generate  extraction  routines  or  queries.  This  process  will  provide  a 
stable  source  of  data  as  a  foundation  for  the  application.  The  migration  team  will  also  be 
responsible  for  evaluating  the  baseline  quality  of  the  data  set  and  generating  a  plan  to 
reach  the  desired  level  of  quality  in  the  final  product. 

Development  of  the  decision  model  drives  the  entire  SDSS  development  process 
and  should  be  given  the  proper  amount  of  attention.  The  failure  of  the  ARIES  project  to 
identify  sources  receiving  frequent  use  required  the  development  team  to  spend  valuable 
time  validating  and  correcting  the  source  data  files.  It  is  important  to  identify  in  the 
decision  elicitation  process  the  data  elements  that  are  currently  being  used  in  a  system. 

Because  the  data  files  used  by  the  ARIES  project  were  made  up  of  a  collection  of 
large  legacy  data  files  from  varying  sources,  there  was  a  need  for  all  members  of  the  team 
to  have  the  same  understanding  of  what  comprised  these  files.  A  system  to  maintain 
detailed  information  about  the  data  files,  i.e.  meta-data,  was  required.  Detailed 
documentation  should  accompany  the  transfer  of  a  source  file  from  the  customer  to  the 
development  team.  Standardizing  the  common  data  elements  in  the  data  resource  file 
provides  the  application  with  future  flexibility.  The  ARIES  project  was  able  to  leverage 
the  spatial  aspects  inherent  the  underlying  data  that  were  directly  associated  with  decision 
measures.  Future  SDSS  implementations  should  make  every  attempt  to  harness  the 
spatial  aspects  of  the  data. 
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As  the  SDSS  methodology  is  used  in  future  implementations,  the  need  to  add 
increased  flexibility  and  feedback  to  the  user  will  continue  to  enhance  the  usability  of  the 
final  product.  Future  implementations  will  concentrate  more  attention  on  the  quality  of 
data  and  ability  of  the  increased  intelligence  in  the  system  to  provide  the  user  with 
highest  quality  decision  support  tool  available. 
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VI.  CONCLUSION 


A.  SUMMARY 

Developing  a  Spatial  Decision  Support  System  (SDSS)  for  the  Army  Reserve 
TPU  relocation  decision  problem  provided  insight  into  new  methods  to  improve  the 
development  methodology  for  a  SDSS.  The  Army  Reserve  Installation  Evaluation 
System  (ARIES)  is  the  result  of  using  this  SDSS  methodology  to  integrate  a  detailed 
decision  model  in  an  automated  DSS.  The  system  integrated  two  commercial  software 
programs,  Logical  Decisions  for  Windows™  as  a  decision  model  solver  and  Maplnfo™ 
as  GIS  mapping  engine.  A  user  interface  (UI),  created  in  Visual  Basic™,  served  as  an 
integration  tool  for  retrieving  data  and  passing  information  to  between  these  components. 

The  system  architecture  developed  for  the  ARIES  project  consisted  of  a  decision 
model,  a  data  model,  and  an  integrating  application.  The  decision  model  was  developed 
under  separate  research  and  constituted  the  basis  for  gathering  the  required  data  to 
evaluate  the  readiness  of  an  Army  Reserve  facility.  The  decision  measures  developed  in 
the  decision  model  generated  a  set  of  business  rules  that  were  mapped  to  actual  data 
elements.  Because  of  the  complexity  of  the  queries,  the  business  rules  and  the  quantity  of 
data  that  was  involved,  the  development  team  identified  a  need  for  a  centralized  data 
resource  file.  The  ARIES  data  resource  file  used  data  warehousing  techniques  to  conduct 
extractions  from  the  many  source  data  files,  and  was  optimized  for  the  ARIES  decision 
process.  A  data  preprocessing  application  was  created  to  generate  this  data  resource  file. 
The  ARIES  Administrator  is  a  Visual  Basic™  application  that  acts  as  a  migration  engine 
to  transform  the  source  data  into  the  structure  required  by  the  ARIES  application. 

Because  of  the  spatial  nature  of  the  decision  model  involved,  the  ARIES  data 
resource  file  used  basic  data  warehousing  techniques,  such  as  aggregation  and 
summarization,  to  take  advantage  of  the  spatial  attributes  of  the  source  data.  This 
spatially  enabled  data  set  is  a  special  form  of  a  data  warehouse  or  data  mart  called  a 
spatial  data  warehouse.  The  spatial  aspects  of  the  data  were  used  in  conjunction  with  the 
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GIS  application  to  maximize  query  performance.  The  primary  advantage  of  using 
geocoded  (i.e.,  spatially  identified)  information  in  the  queries  was  a  significant  increase 
in  performance  for  the  ARIES  application. 

Through  the  use  of  a  spatial  data  warehouse,  the  SDSS  is  buttressed  with  a  stable 
data  source  engineered  to  provide  the  underlying  decision  model  with  the  highest  quality 
data  in  a  timely  manner.  The  integration  of  a  data  migration  plan  (DMP)  in  the  system 
development  process  ensures  that  the  data  resource  generated  for  the  application  allows 
the  SDSS  application  to  generate  meaningful  outcomes. 

B.  CONTRIBUTIONS 

This  research  implemented  the  theoretical  SDSS  methodology  by  creating  an 
integrated  application  in  support  of  complex  site  location  decisions.  As  a  proof  of 
concept  application,  ARIES  demonstrates  the  ability  to  integrate  a  GIS  mapping  engine 
and  a  decision  model  solver  in  a  seamless  and  flexible  environment  that  allows  users  to 
leverage  operational  legacy  database  information  for  decision-making  purposes.  At  an 
applied  level,  this  research  identified  additional  requirements  necessary  during  the 
development  process  to  provide  SDSS  applications  with  stable  and  accurate  data  sources 
of  acceptable  quality.  These  additional  requirements  involved  the  development  of  a  data 
migration  plan  (DMP)  and  the  implementation  of  a  data  quality  assessment  plan. 

1.  General  Contributions 

In  addition  to  the  specific  benefits  afforded  to  USARC,  this  project  identified 
enhancements  to  existing  SDSS  methodology  development  to  ensure  data  quality.  Most 
important  is  the  requirement  to  transport  and  transform  the  underlying  data  into  a  format 
that  allows  the  SDSS  application  to  access  that  data  in  the  most  efficient  manner.  The 
DMP  that  is  outlined  in  this  paper  provides  the  basis  for  a  data  resource  to  instantiate  a 
decision  model  in  a  fashion  that  improves  performance  and  assures  a  confidence  in  of  the 
outcome. 
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Data  quality  was  identified  as  a  limiting  factor  of  the  SDSS  application  too  fully 
analyze  the  site  evaluation  as  well  as  provide  an  outcome  that  is  credible  to  the  user.  This 
identified  the  need  to  incorporate  the  evaluation  and  assessment  of  source  file  data  quality 
as  a  continuing  effort  throughout  the  development  process.  An  important  element  in 
correcting  and  maintaining  an  expected  level  of  quality  for  data  is  the  assignment  of 
individual  responsibility  for  identifying  and  correcting  the  inadequacies  of  the  source  data 
files. 


2.  Specific  Contributions  to  USARC 

The  primary  benefit  of  the  ARIES  project  is  the  use  of  a  very  powerful  decision 
tool  to  provide  the  decision-maker  with  detailed  information  previously  not  available.  By 
implementing  the  detailed  and  complex  queries  that  provides  values  for  the  ARIES 
decision  measures,  USARC  has  benefited  by  being  able  to  analyze  this  information.  The 
ARIES  application  goes  far  beyond  mere  data  retrieval,  allowing  the  decision-maker  to 
manipulate  the  results  of  these  complex  queries  in  a  highly  flexible  and  fully  functional 
decision  environment. 

This  research  showed  by  detailed  analysis  that  14  of  the  20  decision  measures  that 
have  been  automated  will  return  a  valid  value  more  than  90%  of  the  time.  Further  data 
quality  analysis  would  provide  the  USARC  Readiness  team  with  the  assurance  that  the 
ARIES  application  is  basing  its  outcome  on  data  that  are  accurate,  consistent,  complete, 
timely,  and  unique,  as  well  as  valid. 

Through  implementation  of  the  Administrator,  USARC  has  benefited  from  spatial 
data  warehousing  techniques  to  improve  performance  of  the  system  by  centralizing  the 
data  elements  required  for  the  TPU  relocation  decision  problem.  The  Administrator 
provides  a  stable  data  set  to  the  ARIES  application  by  using  queries  that  can  be  repeated 
time  after  time  as  the  source  files  change.  The  Administrator  also  provides  an  automated 
data  quality  filter  that  facilitates  data  cleansing  of  the  source  data  sets. 
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Even  without  an  implementable  SDSS  application,  USARC  has  received  the 
benefit  of  an  in  depth  look  at  the  data  files  they  are  using  in  their  everyday  decision¬ 
making.  It  has  forced  the  group  of  experts  to  verify  and  validate  the  assumptions  they 
may  have  made  concerning  the  site  location  decision  problem. 

The  real  value  of  this  research  may  lie  in  the  basis  it  provides  for  future  SDSS 
applications  to  increase  access  to  decision  information  directly  from  legacy  data  sources. 
Developing  a  strategy  to  provide  SDSS  with  high  quality  data  creates  a  foundation  for  a 
much  higher  probability  of  successful  implementation  of  decision-based  systems. 
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APPENDIX  A.  DECISION  MODEL  MEASURES 


This  appendix  contains  detailed  information  about  each  decision  measure  that  was 
automated  in  the  ARIES  SDSS  prototype  application.  The  information  includes  a  description  of 
each  measure,  the  business  rule  used  to  calculate  the  associated  value,  base  units,  source  files, 
associated  ACROPOLIS  tables,  query  or  queries  involved  in  the  calculation,  a  description  of  the 
yield  curve,  and  a  graph  of  the  yield  curves.  “ACROPOLIS”  is  the  file  name  for  the  ARIES  data 
resource  file. 

The  term  “in  the  area”  in  this  Appendix  is  defined  as  being  with  in  a  50-mile  radius  of  the 
moving  unit  or  proposed  facility. 
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Measure  1.  Facility  Backlogged  Maintenance 

Definition:  Facility  Backlogged  Maintenance  provides  the  total  dollar  value  of  backlogged 

maintenance.  This  provides  an  indication  of  the  initial  investment  required  to 
correct  the  significant  maintenance  problems  with  a  proposed  facility. 

Calculation:  The  Backlogged  Maintenance  value  is  based  upon  the  sum  values  for  maintenance 
actions  documented  for  each  facility  in  the  “CWE_TOTAL”  field  of  the 
RPINFODT  file.  The  summation  is  done  during  the  data  extraction  phase. 

Maint_Cost[Sum  of  outstanding  maintenance  actions  for  a  facility] 

Units:  Dollars 


Source  File:  RPINFODT 


ACROPOLIS  Table(s):  RPINFODT_ 

Query:  Maint_Cost 

SELECT  MAINT_COST 
FROM  RPINFODT_ 

WHERE  RPINFODT.FACID  =  ProposedFacility.FACJD 

Yield  Curve:  A  linear  relationship  is  assumed  between  the  backlogged  maintenance  costs  and 
utility.  Every  dollar  required  or  saved  in  this  category  is  expected  to  have  equal 
utility  to  a  relocating  unit. 


Max  Utility:  0  Min  Utility:  >  1,000,000 
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Measure  2.  Facility  Operating  Costs 


Definition:  Facility  Operating  Costs  provide  an  indication  of  the  financial  resources  that  are 

required  to  maintain  the  facility  in  a  serviceable  condition.  This  includes  both 
utilities  and  minor  maintenance  costs. 

Calculation:  Operating  Costs  are  extracted  from  the  “COSTPRSF”  field  of  the  FPS  file. 

COST_PR_SF[Retrieve  the  Cost  per  Square  Foot  for  a  facility] 

Units:  Dollars  per  square  foot  per  month 

Source  File:  FPS 

ACROPOLIS  Table(s):  FPS_ 

Query:  COST_PR_SF 

SELECT  COSTPRSF 
FROM  FPS_ 

WHERE  FPS.FACID  =  ProposedFacility.FAC  _ID 

Yield  Curve:  A  linear  relationship  is  assumed  between  the  operating  costs  and  utility.  Every 
dollar  required  or  saved  in  this  category  is  expected  to  have  equal  utility  to  a 
relocating  unit. 
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Measure  3.  Facility  Age 


Definition:  This  measure  indicates  the  age  of  the  primary  structure  on  the  proposed  relocation 

site.  It  is  intended  to  reflect  an  assumed  long  term  structural  degradation  with 
time. 

Calculation:  Facility  age  is  calculated  based  upon  the  acquisition  date  found  in  the  INTEREST 
file.  The  acquisition  date  is  compared  to  the  current  date  and  the  difference  is 
determined  in  months. 

DATEACQ  [Current  Year  -  Date  Acquired] 

Units:  Months 

Source  File:  INTEREST 

ACROPOLIS  Table(s):  INTEREST_ 

Query:  DATEACQ 

SELECT  DATE_ACQ 
FROM  INTEREST_ 

WHERE  INTEREST..FACID  =  ProposedFacility.FACJD 
Yield  Curve:  A  linear  relationship  is  used  between  facility  age  and  utility. 


Utility 


o  1,200 

Months 

Max  Utility:  0  Min  Utility.  >1,200 
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Measure  4.  Facility  Condition 


Definition:  Facility  Condition  is  based  upon  a  visual  inspection  of  the  structure  and  provides 

an  indication  of  the  serviceability  of  the  primary  structures. 

Calculation:  This  measure  is  based  upon  the  ISR  part  1  rating  entered  in  the  “FAC_COND” 
field  of  the  FPS  file. 

FACCOND  [Retrieve  Facility  Condition] 

Units:  No  Units(Green,  Amber,  Red) 

Source  File:  FPS 

ACROPOLIS  Table(s):  FPS_ 

Query:  FAC_COND 

SELECT  FAC  COND 
FROM  FPS_ 

WHERE  FPS.FACID  =  ProposedFacility.FACJD 

Yield  Curve:  The  utility  of  these  three  categories  varies  in  discrete  steps.  A  facility  that  is 
categorized  as  “green”  is  judged  to  be  approximately  twice  a  desirable  as  one  that 
is  assigned  an  “amber”  rating. 

1.0 


Utility  0.5 


0 

RED  AMBER  GREEN 

Max  Utility:  GREEN  Min  Utility:  RED 
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Measure  5.  Facility  Ownership 

Definition:  This  measure  indicates  whether  the  facilities  at  a  proposed  relocation  site  are 

leased  or  owned. 

Calculation:  Facility  Ownership  is  based  upon  the  entry  in  the  “GOVTOWN”  field  of  the 
COMPLEX  file. 

GO VT  OWN  [Retrieve  Ownership  Status] 

Units:  No  Units(Yes/No) 

Source  Data:  COMPLEX 

ACROPOLIS  Table(s):  COMPLEX_ 

Query:  GOVT_OWN 

SELECT  GOVT_OWN 
FROM  COMPLEX_ 

WHERE  COMPLEX.FACID  =  ProposedFacility.FACJD 

Yield  Curve:  Facilities  that  are  owned  by  the  government  are  preferred  as  relocation  sites  over 
those  facilities  that  are  leased.  The  owned  sites  are  assigned  the  maximum  utility 
value  of  1.0,  while  leased  sites  are  given  a  0  utility  score. 


Max  Utility:  Yes  Min  Utility:  No 
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Measure  6.  Competition 


Definition:  This  measure  provides  an  indication  of  the  level  of  competition  for  potential 

reservists.  It  considers  only  Army  Reserve  and  Army  National  Guard  units  in  the 
area  of  the  relocation  site. 


Calculation:  Competition  is  determined  by  the  number  of  positions  that  must  be  filled  by  all 
other  Army  Reserve  and  Army  National  Guard  (ARNG)  units  in  the  area  of  the 
proposed  relocation  site.  For  Army  Reserve  units,  the  number  of  required 
positions  is  determined  by  counting  the  number  of  records  in  the  G19TRUE  file 
associated  with  each  UIC  in  the  area.  For  ARNG  units,  the  value  is  found  in  the 
“AUTH”  field  of  the  NGNONCL  file. 

NO_AUTH_NG[Number  Authorized  National  Guard]  + 

NO_REQD  [Number  Area  Reservists  Required] 

Units:  Number  of  competing  positions 

Source  File:  COMMAND  PLAN,  G1 7,  G19TRUE,  GEOREF,  NGNON  CL 


ACROPOLIS  Table(s):  CMDPLAN,  G17Natl,  G19Natl,  VALIDUIC 


Query:  Area-FACID  List(MapInfo) 

SELECT  FACID  INTO  TempFACID 

FROM  GEOREF 

WHERE  Object  Within  ObjAreaBuffer 

ORDER  BY  FAC  ID 

(Note:  ObjAreaBuffer  is  equal  to  300  miles) 


VALIDUIC 

SELECT  UIC,  FAC  ID,  UnitName,  City,  State,  Zip 
FROM  G17Natl 

WHERE  G17Natl.UIC  =  ANY  (SELECT  CMDPLAN.UIC 

FROM  CMDPLAN) 


Area-UIC  List 

SELECT  DISTINCT  UIC  INTO  AREA  UIC 
FROM  VALIDJJIC 

WHERE  VALID  UIC.FAC  ID  =  ANY  (SELECT  ARE A_F ACID.F AC  ID 

FROM  ARE A_F ACID) 


NO_AUTH_NG(MapInfo) 

SELECT  *  INTO  TempNGUnits 

FROM  NONCLOS 

WHERE  Obj  Within  ObjAreaBuffer 
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Yield  Curve: 


SELECT  SUM(AUTH)  “No_AUTH_NG”  INTO  Strength 
FROM  TempNGUnits 

NOREQD 

SELECT  SUM(UIC_TOTAL)  AS  TOTAL_REQD 
FROM  G19Natl 

WHERE  G 1 9Natl.UIC  =  ANY  (SELECT  AREA_UIC.UIC 

FROM  AREA  UIC) 


A  linear  relationship  exists  between  the  number  of  competing  positions  from  other 
units  and  the  utility  of  a  relocation  site.  The  level  of  no  site  utility  in  this  measure 
begins  at  10,000  positions  which  is  above  the  maximum  value  expected. 


Max  Utility:  0 


Min  Utility:  >  10,000 


Measure  7.  Average  Area  Drill  Attendance 


Definition:  This  measure  indicates  the  fraction  of  reservists  with  satisfactory  drill  attendance 

for  all  existing  units  in  the  area  of  the  proposed  relocation  site.  Areas  with  a  high 
fraction  of  satisfactory  drill  attendance  are  preferred  relocation  sites  because  units 
relocated  to  that  area  are  assumed  to  perform  similarly  in  drill  attendance. 

Calculation:  This  measure  considers  the  last  four  quarters  of  data  contained  in  the  FINANCE 
file.  After  initial  screening,  the  number  of  reservist  with  21  or  more  drill  periods 
for  the  year  is  divided  by  the  total  number  of  people  who  meet  the  screening. 

DRILL  SAT  [Number  of  reservists  with  >21  drill  periods  in  a  year] 

DRILL  TOTAL  [Number  of  reservists  required  to  drill] 


Units:  Ratio 


Source  File:  COMMAND  PLAN,  FINANCE,  G17,  G19TRUE,  GEOREF 


ACROPOLIS  Table(s):  CMDPLAN,  FINANCE  ,  FINANCE  QTR,  G17Natl,  G19Natl, 

VALID  UIC 


Query:  Area-FACID  List(MapInfo) 

SELECT  FAC_ID  INTO  TempF ACID 

FROM  GEOREF 

WHERE  Object  Within  ObjAreaBuffer 

ORDER  BY  FAC  ID 

(Note:  ObjAreaBuffer  is  equal  to  300  miles) 

VALIDJJIC 

SELECT  UIC,  FAC  JD,  UnitName,  City,  State,  Zip 
FROM  G17Natl 

WHERE  G17Natl.UIC  =  ANY  (SELECT  CMDPLAN.UIC 

FROM  CMDPLAN) 


Area-UIC  List 

SELECT  DISTINCT  UIC  INTO  AREAJJIC 
FROM  VALIDJJIC 

WHERE  VALID JJIC.FAC JD  =  ANY  (SELECT  AREA  FACID.FAC  ID 

FROM  AREA  FACID) 
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FINANCE_CY 

SELECT  UIC,  COUNT(UIC)  AS  UIC_TOTAL  INTO  FINANCECY 
FROM  FINANCEQTR 
WHERE  (Select  Case) 

Case  1st  Qtr  FY 

(UTA1Q1PF  +  UTA2Q1PF  +  UTA3Q1PF  +  UTA4Q1PF)  >  20 
Case  2nd  Qtr  FY 

(UTA2Q1PF  +  UTA3Q1PF  +  UTA4Q1PF  +  UTA1QCFY)  >  20 
Case  3rd  Qtr  FY 

(UTA3Q1PF  +  UTA4Q1PF  +  UTA1QCFY  +  UTA2QCFY)  >  20 
Case  4th  Qtr  FY 

(UTA4Q1PF  +  UTA1QCFY  +  UTA2QCFY  +  UTA3QCFY)  >  20 
GROUP  BY  UIC 
ORDER  BY  UIC 

DRILL-SAT 

SELECT  SUM(UIC_TOTAL)  AS  TOTAL  SAT 
FROM  FINANCECY 

WHERE  FINANCE_CY.UIC  =  ANY  (SELECT  AREAJJIC.UIC 

FROM  AREA _UIC) 


DRILL-TOTAL 

SELECT  SUM(UIC  TOTAL)  AS  DRILL  TOTAL 
FROM  FINANCE_ 

WHERE  FINANCE  .UIC  =  ANY  (SELECT  AREA_UIC.UIC 

FROM  AREAJJIC.UIC) 

Yield  Curve:  The  utility  of  the  average  drill  attendance  rate  increases  linearly  between  the 
values  of  0  and  0.6.  Above  that  point,  increases  in  the  attendance  rate  result  in 
diminishing  returns.  Values  above  0.6  become  increasingly  uncommon. 


Utility 


Max  Utility:  1.0  Min  Utility:  0.0 
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Measure  8.  Area  Loss  Rate 


Definition:  This  measure  indicates  the  fraction  of  reservists  who  left  the  reserves  in  the 

previous  fiscal  year,  for  all  existing  units  in  the  area  of  the  proposed  relocation 
site.  Areas  with  a  low  loss  rate  are  preferred  relocation  sites  because  units 
relocated  to  that  area  will  also  experience  low  loss  rates. 

Calculation:  The  number  of  losses  to  units  in  the  area  in  the  previous  fiscal  year  is  divided  by 
the  number  of  reservists  currently  assigned  to  these  units.  Losses  are  identified 
through  the  transfer  mnemonic  field  (TRMN=”LOSS”)  of  the  FyxxLOSS  file. 
The  number  of  assigned  reservists  is  determined  by  counting  all  of  the  personnel 
records  in  the  G18CWE  file  associated  with  each  UIC  in  the  area. 


NO_LOSS[Total  Number  of  Losses  in  the  last  year] 
NO_ASSN[Total  Number  Reservists  Assigned] 


Units:  Ratio 

Source  File:  COMMAND  PLAN,  FYxxLOSS,  G17,  G18CWE,  GEOREF 
ACROPOLIS  Table(s):  CMDPLAN,  FYxxLOSS,  G17Natl,  G18Natl_UIC,  VALIDJJIC 


Query  :  Area-FACID  List(MapInfo) 

SELECT  FAC _ID  INTO  TempFACID 

FROM  GEOREF 

WHERE  Object  Within  Obj  AreaBuffer 

ORDER  BY  FAC_ID 

(Note:  ObjAreaBuffer  is  equal  to  300  miles) 

VALIDUIC 

SELECT  UIC,  FAC  ID,  UnitName,  City,  State,  Zip 
FROM  G17Natl 

WHERE  G1 7Natl.UIC  =  ANY  (SELECT  CMDPLAN.UIC 

FROM  CMDPLAN) 


Area-UIC  List 

SELECT  DISTINCT  UIC  INTO  AREAJLJIC 
FROM  VALIDUIC 

WHERE  VALID  UIC.FAC  ID  =  ANY  (SELECT  AREA  FACID.FAC  ID 

FROM  ARE A_F ACID) 


NO_ASSN 

SELECT  SUM(UIC_TOTAL)  AS  TOTAL  ASSN 
FROM  G18Natl_UIC 

WHERE  G1 8Natl_UIC.UIC  =  ANY  (SELECT  AREA  UIC.UIC 

FROM  AREAUIC) 
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NOLOSS 

SELECT  SUM(UIC_TOTAL)  AS  TOTAL_LOSS 
FROM  FYxxLOSS 

WHERE  FYxxLOSS.UIC  =  ANY  (SELECT  AREA_UIC.UIC 

FROM  AREA_UIC) 

Yield  Curve:  This  function  includes  both  concave  and  convex  regions.  The  inflection  point 
occurs  at  a  loss  rate  of  .33  and  a  utility  of  0.5.  Based  on  experience,  a  loss  rate  of 
one  third  per  year  was  considered  to  be  typical.  Any  loss  rate  below  this  value 
has  relatively  high  utility,  whereas  loss  rates  above  the  inflection  point  quickly 
approach  a  utility  of  zero. 


Utility 


Max  Utility:  0 


Min  Utility:  1 


Measure  9.  Area  Transfer  Rate 


Definition:  This  measure  indicates  the  fraction  of  reservists  who  transferred  to  different  units 

in  the  previous  fiscal  year  for  all  existing  units  in  the  area  of  the  proposed 
relocation  site.  Areas  with  a  low  transfer  rate  are  preferred  relocation  sites 
because  units  relocated  to  that  area  will  also  experience  low  transfer  rates. 

Calculation:  The  number  of  transfers  in  the  previous  fiscal  year  is  divided  by  the  number  of 
reservists  currently  assigned  to  the  unit.  Transfers  are  identified  through  the 
transfer  mnemonic  field  (TRMN=”TRFD”)  of  the  FyxxLOSS  file.  The  number  of 
assigned  reservists  is  determined  by  counting  all  of  the  personnel  records  in  the 
G18CWE  file  associated  with  each  UIC. 


NO_XFER[Total  Number  of  Transfers  in  the  last  year] 
NO_ASSN[Total  Number  Reservists  Assigned] 


Units:  Ratio 

Source  File:  COMMAND  PLAN,  FYxxLOSS,  G17,  G18CWE,  GEOREF 


ACROPOLIS  Table(s):  CMDPLAN,  G17Natl,  G18Natl_UIC,  FYxxXFER,  VALIDJJIC 


Query:  Area-FACID  List(MapInfo) 

SELECT  FAC _ID  INTO  TempFACID 

FROM  GEOREF 

WHERE  Object  Within  ObjAreaBuffer 

ORDER  BY  FAC_ID 

(Note:  ObjAreaBuffer  is  equal  to  300  miles) 

VALIDJJIC 

SELECT  UIC,  FAC  ID,  UnitName,  City,  State,  Zip 
FROM  G17Natl 

WHERE  G17Natl.UIC  =  ANY  (SELECT  CMDPLAN.UIC 

FROM  CMDPLAN) 


Area-UIC  List 

SELECT  DISTINCT  UIC  INTO  AREA  JJIC 
FROM  VALIDJJIC 

WHERE  V  ALID  JJIC  .FAC  ID  -  ANY  (SELECT  AREA  FACID.FAC  ID 

FROM  AREA  F  ACID) 
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NO_ASSN 

SELECT  SUM(UIC_TOTAL)  AS  TOTAL  ASSN 
FROM  G18Natl_UIC 

WHERE  G1 8Natl_UIC.UIC  =  ANY  (SELECT  AREAJJIC.UIC 

FROM  AREA_UIC) 


NOXFER 

SELECT  SUM(UIC_TOTAL)  AS  TOTAL  XFER 
FROM  FYxxXFER 

WHERE  FYxxXFER. UIC  =  ANY  (SELECT  AREAJJIC.UIC 

FROM  AREAUIC) 


Yield  Curve:  This  function  includes  both  concave  and  convex  regions  The  inflection  point 
occurs  at  a  loss  rate  of  .33  and  a  utility  of  0.5.  Based  on  experience,  a  transfer  rate 
of  one  third  per  year  was  considered  to  be  typical.  Any  loss  rate  below  this  value 
has  relatively  high  utility  (close  to  1.0),  whereas  loss  rates  above  the  inflection 
point  quickly  approach  a  utility  of  zero. 


Max  Utility:  0  Min  Utility:  1 


/ 
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Measure  10.  Area  Average  Manning 


Definition:  This  measure  indicates  the  ability  to  fill  the  required  positions.  An  average  value 

is  determined  for  all  existing  units  in  the  area  of  the  proposed  relocation  site. 
Areas  with  high  average  manning  levels  are  preferred  relocation  sites  because 
units  relocated  to  that  area  will  also  experience  high  manning  levels. 

Calculation:  The  number  of  reservists  assigned  to  area  units  (based  upon  the  number  of 
personnel  records  in  G18CWE  file  associated  with  each  UIC)  is  divided  by  the 
number  of  required  positions  (based  upon  the  number  of  positions  in  the 
G19TRUE  file  associated  with  each  UIC).  An  average  is  calculated  for  all  UIC’s 
in  the  area  of  the  proposed  site. 

NO_ASSN[Total  Number  Reservists  Assigned] 

NO_REQD[Number  Area  Reservists  Required] 

Units:  Ratio 

Source  File:  COMMAND  PLAN,  G17,  G18CWE,  G19TRUE,  GEOREF 

ACROPOLIS  Table(s):  CMDPLAN,  G17Natl,  G18Natl_UIC,  G19Natl,  VALIDJJIC 

Query:  Area-FACID  List(MapInfo) 

SELECT  FACJD  INTO  TempF ACID 

FROM  GEOREF 

WHERE  Object  Within  ObjAreaBuffer 

ORDER  BY  FACJD 

(Note:  ObjAreaBuffer  is  equal  to  300  miles) 

VALIDJJIC 

SELECT  UIC,  FACJD,  UnitName,  City,  State,  Zip 
FROM  G17Natl 

WHERE  G17Natl.UIC  =  ANY  (SELECT  CMDPLAN.UIC 

FROM  CMDPLAN) 


Area-UIC  List 

SELECT  DISTINCT  UIC  INTO  AREAJJIC 
FROM  VALIDJJIC 

WHERE  VALID  JJIC.FAC JD  =  ANY  (SELECT  ARE A_F ACID .F AC  ID 

FROM  AREAFACID) 
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NO_ASSN 

SELECT  SUM(UIC_TOTAL)  AS  TOTALASSN 
FROM  G18Natl_UIC 

WHERE  G1 8Natl_UIC.UIC  =  ANY  (SELECT  AREA_UIC.UIC 

FROM  AREAJJIC) 


NOREQD 

SELECT  SUM(UIC_TOTAL)  AS  TOTAL  REQD 
FROM  G19Natl 

WHERE  G19Natl.UIC  =  ANY  (SELECT  AREAJUIC.UIC 

FROM  AREA_UIC) 

Yield  Curve:  It  is  desirable  that  area  units  be  able  to  exceed  their  minimum  manning 
requirements.  All  manning  levels  above  125%  are  considered  to  have  maximum 
utility.  Manning  levels  below  this  value  drop  off  quickly  in  terms  of  utility. 


Max  Utility:  1.25 


Min  Utility:  0 
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Measure  11.  Distance  to  Nearest  Recruit  Station 


Definition:  Distance  to  the  nearest  Recruiting  Station  provides  one  indication  of  recruiter 

effectiveness. 

Calculation:  The  straight-line  distance  from  the  proposed  site  to  the  closest  recruiting  station  is 
calculated  using  a  geocoded  version  of  the  RZA  file. 

DIST_RZA[Determine  distance  to  nearest  Recruit  Station] 

Units:  Miles 

Source  Data:  RZA 

ACROPOLIS  Table(s):  NONE 

Query:  DIST_RZA(MapInfo) 

SELECT  * 

FROM  RZA 

WHERE  Obj  Withing  ObjDistanceBuffer  into  TempRZA 
(Note:  ObjDistanceBuffer  is  equal  to  300  miles) 

SELECT  Distance((CentroidX(Obj),  CentroidY(Obj),  FacIDLat,  FacIDLong,  “mi”) 
FROM  TempRZA 

ORDER  BY  Distance  INTO  TempRZA.Dist 

Yield  Curve:  The  effectiveness  of  a  recruiting  station  in  filling  positions  at  a  reserve  unit  is 
fairly  high  if  the  two  are  within  a  half  hour  drive  of  each  other.  It  is  assumed  that 
recruiters  are  most  effective  in  the  area  close  to  their  recruiting  station  and  that 
reserve  recruits  must  be  located  near  the  unit  with  which  they  will  serve.  A 
distance  of  30  miles  is  assigned  an  average  utility  of  0.5.  A  small  change  in 
distance  results  in  less  change  in  desirability  when  the  distance  is  very  small  or 
very  large  than  it  does  when  the  distance  is  around  30  miles. 


Max  Utility:  0 


Min  Utility:  >100 
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Measure  12.  Available  Transfers  from  Closing  Units 

Definition:  This  value  indicates  the  total  number  of  personnel  assigned  to  closing  units  within 

50  miles  of  the  proposed  site. 

Calculation:  A  list  of  Unit  Identification  Codes  (UIC’s)  is  created  which  contains  only  those 
units  scheduled  to  close  within  18  months.  These  units  are  identified  by  an  entry 
of  5 B  in  the  “Tier”  field  of  the  G17  file.  The  number  of  potential  transfers  from 
closing  units  is  calculated  by  summing  the  number  of  records  in  the  G18CWE 
database  for  the  closing  units  which  are  located  in  the  area  of  the  proposed 
relocation  site. 

TOTAL_AVAIL[Total  Number  of  Available  Reservists  from  Area  Closing  Units] 

Units:  Ratio 

Source  File:  COMMAND  PLAN,  G1 7,  G18CWE,  GEOREF,  US_ZIPS(MapInfo) 

ACROPOLIS  Table(s):  CMDPLAN,  G17Natl,  VALIDJJIC 

Query:  Area-FACID  List(MapInfo) 

SELECT  FAC_ID  INTO  TempF ACID 

FROM  GEOREF 

WHERE  Obj  Within  objAreaBuffer 

ORDER  BY  FAC  ID 

(Note:  objAreaBuffer  is  equal  to  300  miles) 

VALIDUIC 

SELECT  UIC,  FAC_ID,  UnitName,  City,  State,  Zip 
FROM  G17Natl 

WHERE  G17Natl.UIC  =  ANY  (SELECT  CMDPLAN.UIC 

FROM  CMDPLAN) 


Area-UIC  List 

SELECT  DISTINCT  UIC  INTO  AREAJJIC 
FROM  VALIDJJIC 

WHERE  VALID JJIC.FAC JD  =  ANY  (SELECT  AREA  FACID.FAC  ID 

FROM  AREAFACID) 


AREACLOSJJIC 
SELECT  UIC 
FROM  G17Natl 
WHERE  G17Natl.TIER  =  “5B” 

AND  G17Natl.UIC  =  ANY  (SELECT  AREAJJIC.UIC 

FROM  AREAJJIC) 
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AREA_ZIPCODE(MapInfo) 

SELECT  ZIPCODE 
FROM  US_ZIPS 
WHERE  Obj  Within  objAreaBuffer 
ORDER  BY  ZIP_CODE 

Area_Gl  8_ZIP(MapInfo) 

SELECT  DISTINCT  UIC,  ZIPCODE,  COUNT(UIC)  AS  UICTOTAL 
FROM  G18CWE 
GROUP  BY  UIC,  ZIPCODE 
ORDER  BY  UIC,  ZIPCODE 

TOTAL_AVAIL 

SELECT  SUM(UIC  TOTAL)  AS  TOTAL  AVAIL 
FROM  Area_Gl  8ZIP 

WHERE  AreaGl  8ZIP.UIC  =  ANY  (SELECT  AREACLOSUIC.UIC 

FROM  AREA_CLOS_UIC) 

AND  Area  G  1 8_ZIP .ZIPCODE  -  ANY  (SELECT  AREA__ZIPCODE.ZIP 

FROM  AREAZIPCODE) 

Yield  Curve:  The  shape  of  this  function  assumes  diminishing  returns  in  the  number  of  transfers 
available.  Experience  suggests  that  for  an  average  unit  of  100  people, 
approximately  half  have  prior  reserve  experience  and  that  approximately  half  of 
the  people  in  a  closing  unit  will  be  able  to  transfer  their  skills  directly  to  a  new 
unit.  The  value  of  the  first  100  reservists  increases  at  a  nearly  linear  rate  because 
they  provide  preferred  fills  for  approximately  50  of  the  positions  of  the  moving 
unit.  A  value  of  100  personnel  is  assigned  a  utility  of  0.9.  The  incremental  value 
added  by  each  additional  person  over  100  continues  to  drop  until  no  marginal  gain 
is  expected  over  500. 


Max  Utility:  >250 
Min  Utility:  0 
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Measure  13.  IRR  Available 


Definition: 

Calculation: 

Units: 

Source  File: 
Query: 


Yield  Curve: 


Individual  Ready  Reserve  (IRR)  Available  is  the  number  of  IRR  members  living 
in  the  area  of  the  proposed  relocation  site.  This  is  a  measure  of  the  size  of  the 
prior  service  market. 

A  geographical  query  returns  the  total  number  of  IRR  members  living  within  a 
specified  distance  of  the  proposed  relocation  site.  This  process  requires  a 
geocoded  version  of  the  IRR  file. 

TOTAL  IRRfTotal  Number  of  Available  IRR  from  the  Area] 

People 

IRR 

ArealRR(MapInfo) 

SELECT  ZIPC  “ZIP”,  LEFT$(PMOS,  3)  “MOS” 

FROM  IRR 

WHERE  Obj  Within  objAreaBuffer 

AND  ZIPC  o  AND  PMOS  o 
ORDER  BY  ZIPC 

(Note:  objAreaBuffer  is  equal  to  300  miles) 

TOTALIRR 

SELECT  COUNT(*)  AS  TOTAL  IRR 
FROM  ArealRR 

For  a  typical  unit  of  100  people,  it  is  assumed  that  approximately  40  positions 
could  best  be  filled  by  IRR  members.  The  recruiting  rate  for  the  IRR  is 
approximately  1  percent,  so  an  area  that  offers  4,000  IRR  members  is  assigned  an 
average  utility  of  0.5.  Above  this  point,  there  are  diminishing  returns.  The 
market  begins  to  exceed  the  personnel  demand  of  a  moving  unit  and  limited 
recruiting  efforts  become  marginally  less  effective.  The  utility  of  smaller 
numbers  quickly  drops  off  because  of  the  importance  of  this  source  of  recruits. 


Max  Utility:  >10,000 
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Min  Utility:  0 
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Measure  14.  Recruit  Market 


Definition:  The  Recruit  Market  measure  estimates  the  total  number  of  males  who: 

1 .  live  in  the  area  of  the  proposed  relocation  site 

2.  Would  score  in  the  top  half  on  the  Armed  Forces  Qualification  Test  (AFQT) 

3.  Fall  into  the  desired  age  group  (17-29  years  old) 

Calculation:  This  measure  sums  the  entries  for  all  mental  categories  1  through  3 A,  and  all 
ethnic  groups  for  the  zip  codes  of  interest  in  the  Qualified  Military  Available 
(QMA)  file.  The  version  of  QMA  used  contains  only  the  estimates  for  males 
within  the  age  range  of  17  to  29. 


TOTAL_MARKET[Total  Non-Prior  Service  Personnel  from  the  Area] 
Units:  People 

Source  File:  QMA,  US  ZIPS(MapInfo) 

ACROPOLIS  Table(s):  NONE 


Query:  QMA(MapInfo) 

SELECT  LEFT$(ZIP,  5)  “ZIPCODE”,  MWCAT12,  MWCAT3A,  MBCAT12, 
MBCAT3A,  MHCAT12,  MHCAT3A 
FROM  QMA 

WHERE  Obj  Within  objAreaBuffer 
ORDER  BY  ZIP 

(Note:  objAreaBuffer  is  equal  to  300  miles) 

AREAZIPCODE(MapInfo) 

SELECT  ZIP  CODE 
FROM  US_ZIPS 
WHERE  Obj  Within  objAreaBuffer 
ORDER  BY  ZIP  CODE 

TOT  AL_M  ARKET 

SELECT  SUM(MWCAT12+MWCAT3 A+MBCAT1 2+MBCAT3A+ 

MHC AT  1 2+MHC AT3  A)  AS  TOTAL_MARKET 
FROM  QMA 

WHERE  QMA.ZIP  =  ANY  (SELCET  AREA_ZIPCODE.ZIP 

FROM  AREAZIPCODE) 


105 


Yield  Curve:  Approximately  half  of  a  typical  unit  of  1 00  reservists  is  filled  by  recruits  with  no 
prior  service.  Assuming  a  recruit  rate  of  0.25  percent,  there  must  be  at  least 
20,000  people  in  the  area  of  the  proposed  relocation  site  who  meet  all  of  the 
requirements  stated  above.  This  value  is  assigned  a  typical  utility  of  0.5.  As  the 
number  increases,  there  are  diminishing  returns.  The  market  begins  to  exceed  the 
personnel  demand  of  a  moving  unit  and  limited  recruiting  efforts  become 
marginally  less  effective. 


Max  Utility:  >250,000  Min  Utility:  0 
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Measure  15.  Reassignments 


The  Reassignments  measure  indicates  the  fraction  of  the  reservists  assigned  to  the 
moving  unit  who  currently  live  within  a  specified  distance  (50  miles)  of  the 
proposed  relocation  site 

This  measure  is  calculated  by  first  determining  all  zip  codes  that  lie  within  a 
specified  distance  of  the  proposed  relocation  site  (based  upon  zip  code  centroid) 
and  then  identifying  all  reservists  who  both  live  within  one  of  the  identified  zip 
codes  (based  upon  the  “ZIP”  field  of  the  G18CWE  file)  and  are  assigned  to  the 
moving  unit  (based  upon  the  “UIC”  field  of  the  G18CWE  file).  Then  the  number 
available  reassignments  is  divided  by  the  total  number  of  reservists  assigned  to 
the  moving  unit. 

TOTAL_RESERVISTS  [Total  Number  of  Available  Reservists  from  the  Moving  Unit] 
UIC_TOTAL[Total  Number  of  Reservists  Assigned  Moving  Unit] 

Units:  Ratio 

Source  File:  G18CWE,  US_ZIPS(MapInfo) 

ACROPOLIS  Table(s):  G18Natl 

Query:  AREA_ZIPCODE(MapInfo) 

SELECT  ZIPCODE 
FROM  US_ZIPS 
WHERE  Obj  Within  objAreaBuffer 
ORDER  BY  ZIP  CODE 

G18(MapInfo) 

SELECT  UIC,  LEFT$(ZIP,5)  “ZIPCODE”,  PRI  “MOS” 

FROM  G18CWE 

WHERE  Obj  Within  objGl  8Buffer  AND  PRI  o  “” 

ORDER  BY  UIC,  ZIP 
INTO  G18 

Area_G18_ZIP 

SELECT  DISTINCT  UIC,  ZIPCODE,  COUNT(UIC)  AS  UIC_TOTAL 

FROM  G18 

GROUP  BY  UIC,  ZIPCODE 

ORDER  BY  UIC,  ZIPCODE 


Definition: 


Calculation: 
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TOTALRESERVISTS 

SELECT  SUM(UIC_TOTAL)  AS  TOTAL  RESERVISTS 

FROM  Area_G18_ZIP 

WHERE  AreaG  1 8_ZIP.UIC  =  MovingUnit.UIC 

AND  Area_G  1 8_ZIP.ZIPCODE  =  ANY  (SELECT  AREA_ZIPCODE.ZIP 

FROM  AREAZIPCODE) 


UICTOTAL 
SELECT  UIC  TOTAL 
FROM  G18Natl 

WHERE  G18Natl.UIC  =  MovingUnit.UIC 

Yield  Curve:  The  current  location  will  always  receive  a  utility  score  of  0.0  on  this  measure.  For 
relatively  close  relocation  sites,  this  function  was  made  to  be  convex,  assigning 
high  utility  values  to  alternatives  that  are  close  to  the  current  location. 


Potential  reassignments  / 

Total  number  of  reservists 


Max  Utility:  1 .0 


Min  Utility:  0.0 


Measure  16.  Distance  to  Area  Maintenance  Support  Activity 

Definition:  Distance  to  the  nearest  Area  Maintenance  Support  Activity  (AMS A)  is  calculated 

as  a  proxy  measure  for  response  time  and  support  quality. 

Calculation:  The  straight-line  distance  from  the  proposed  site  to  the  closest  AMS  A  is 
calculated  using  a  geocoded  version  of  the  AMSA  file. 

DIST_AMSA[Determine  distance  to  nearest  AMSA  Site] 

Units:  Miles 

Source  Data:  AMSA 

ACROPOLIS  Table(s):  NONE 

Query:  DISTAMSA(MapInfo) 

SELECT  * 

FROM  AMSA 

WHERE  Obj  Withing  ObjDistanceBuffer  into  TempRZA 
(Note:  ObjDistanceBuffer  is  equal  to  300  miles) 

SELECT  Distance((CentroidX(Obj),  CentroidY(Obj),  FacIDLat,  FacIDLong,  “mi”) 

FROM  TempAMSA 

ORDER  BY  Distance  INTO  TempAMSA.Dist 

Yield  Curve:  The  desirability  of  a  relocation  site  is  relatively  insensitive  to  small  changes  in 
distance  for  both  close  and  distant  AMSA  sites.  Little  degradation  in  service  is 
expected  if  the  AMSA  can  have  parts  and  technicians  on  site  within  a  couple 
hours  using  a  car  or  truck.  It  is  possible  that  a  trainer  that  breaks  down  in  the 
morning  may  be  operational  for  an  afternoon  training  session.  At  approximately 
200  miles  (assigned  a  0.5  utility)  it  starts  to  become  impractical  to  expect  same 
day  service  and  avoid  an  overnight  stay.  Eventually  it  becomes  necessary  to 
consider  flying  rather  than  driving  which  is  likely  to  further  reduce  the 
responsiveness  and  effectiveness  of  the  AMSA. 


Max  Utility:  0  Min  Utility:  >  500 
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Measure  17.  Distance  to  Nearest  Equipment  Concentration  Site 

Definition:  Distance  to  the  nearest  Equipment  Concentration  Site  (ECS)  provides  an 

indication  of  the  training  time  that  must  be  used  to  travel  back  and  forth. 

Calculation:  The  straight-line  distance  from  the  proposed  site  to  the  closest  ECS  is  calculated 
using  a  geocoded  version  of  the  ECS  file. 

DIST  ECS  [Determine  distance  to  nearest  ECS] 

Units:  Miles 

Source  Data:  ECS 

ACROPOLIS  Table(s):  NONE 

Query:  DIST_ECS(MapInfo) 

SELECT  * 

FROM  ECS 

WHERE  Obj  Withing  ObjDistanceBuffer  into  TempECS 
(Note:  ObjDistanceBuffer  is  equal  to  300  miles) 

SELECT  Distance((CentroidX(Obj),  CentroidY(Obj),  FacIDLat,  FacIDLong,  “mi”) 
FROM  TempECS 

ORDER  BY  Distance  INTO  TempECS.Dist 

Yield  Curve:  The  desirability  of  an  Equipment  Concentration  Site  is  relatively  insensitive  to 
small  changes  in  distance  for  both  close  and  distant  sites.  Typically,  a  site  that 
can  be  reached  within  an  hour  and  ten  minutes  is  not  significantly  less  desirable 
than  one  that  can  be  reached  in  ten  minutes.  An  hour  of  one-way  travel  time  is 
not  normally  considered  to  be  excessive  and  allows  for  most  of  the  time  to  be 
spent  training  on  a  one  day  training  exercise.  At  approximately  60  miles 
(assigned  a  0.5  utility)  it  starts  to  become  impractical  to  expect  useful  training  to 
be  conducted  on  a  day  trip  and  avoid  an  overnight  stay.  Eventually  it  becomes 
necessary  to  consider  flying  rather  than  driving  which  is  likely  to  further  reduce 
the  desirability  of  the  ECS. 
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Measure  18.  Facility  Weekends  Used 

Definition:  Facility  Weekend  Usage  provides  the  number  of  weekends  per  month  that  the 

facility  is  currently  in  use.  This  measure  treats  a  facility  as  a  limited  resource  that 
is  incrementally  depleted  as  more  units  are  assigned.  Since  most  units  require 
exclusive  use  of  the  facility  one  weekend  every  month,  the  number  of  weekends 
used  normally  corresponds  to  the  number  of  units  assigned  and  is  typically  limited 
to  four. 

Calculation:  This  value  is  extracted  from  the  “RS_WKND_PM”  field  of  the  COMPLEX  file. 

WKND_USED[Retrieve  Number  Weekends  Facility  Used  per  Month] 

Units:  Weekends  per  month 

Source  Data:  COMPLEX 

ACROPOLIS  Table(s):  COMPLEX_ 

Query:  WKNDJJSED 

SELECT  COMPLEX  .FAC  WKND  USED 
FROM  COMPLEX, 

WHERE  COMPLEX_.FAC_ID  =  ProposedFacility.FACJD 

Yield  Curve:  Although  some  exceptions  exist,  a  typical  facility  offers  no  utility  to  a  relocating 
unit  if  all  four  weekends  are  already  being  used.  Although  most  facilities  with 
three  units  or  less  should  be  able  to  accommodate  a  new  unit  and  might  be  viewed 
as  having  equal  utility,  other  issues  such  as  full  time  administrative  space  and 
available  equipment  storage  space  make  a  facility  with  fewer  units  currently 
assigned  slightly  more  desirable. 


Max  Utility:  0  Min  Utility:  4 
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Measure  19.  Available  MOS  from  Closing  Units 

Definition:  This  measure  provides  the  number  of  reservists  from  closing  units  in  the  area  of 

the  proposed  relocation  site  who  possess  a  Military  Occupational  Specialty 
(MOS)  needed  by  the  relocating  unit.  These  people  provide  a  preferred  pool  of 
trained  and  qualified  recruits. 

Calculation:  The  number  of  personnel  records  (from  the  G18CWE  file)  that  meet  all  the 
following  requirements  are  counted: 

1.  The  reservist  is  assigned  to  a  unit  that  is  scheduled  to  close  (a 
TIER=”5B”  entry  in  the  G17  file  is  used  to  produce  a  list  of  closing 
units). 

2.  The  reservist  lives  in  a  zip  code  in  the  area  of  the  proposed  relocation 
site. 

3.  The  reservist’s  primary  MOS  is  needed  by  the  moving  unit. 

If  the  three  MOS  groups  with  the  largest  number  of  members  in  the  moving  unit 
account  for  more  than  50  percent  of  the  total  unit  membership,  then  only  those 
three  MOS’s  are  considered.  Otherwise  all  MOS’s  required  by  the  moving  unit 
are  considered  as  an  MOS  of  interest. 

TOT AL  CLOS  MOS  [Total  Number  of  Available  Reservists  from  Area  Closing  Units  with 

MOS’s  of  Interest] 


Units:  Number  of  people 

Source  File:  COMMAND  PLAND,  G17,  G18CWE,  GEOREF,  US  ZIPS(MapInfo) 
ACROPOLIS  Table(s):  CMDPLAN,  G17Natl,  G18Natl,  VALIDJJIC 


Query:  Area-FACID  List(MapInfo) 

SELECT  FAC_ID  INTO  TempFACID 

FROM  GEOREF 

WHERE  Obj  Within  objAreaBuffer 

ORDER  BY  FACJD 

(Note:  objAreaBuffer  is  equal  to  300  miles) 

VALIDJJIC 

SELECT  UIC,  FAC  ID,  UnitName,  City,  State,  Zip 
FROM  G17Natl 

WHERE  G17Natl.UIC  =  ANY  (SELECT  CMD_PLAN.UIC 

FROM  CMD_PLAN) 
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Area-UIC  List 

SELECT  DISTINCT  UIC  INTO  AREAJLJIC 
FROM  VALIDUIC 

WHERE  VALID JUIC.FACJD  =  ANY  (SELECT  ARE A_F ACID. FAC_ID 

FROM  AREAFACID) 


NoAssnxMOS 

SELECT  MOS,  COUNT(*)  AS  MOS_COUNT  INTO  NoAssnxMOS 
FROM  G18Natl 

WHERE  G18Natl.UIC  =  MovingUnit.UIC 

GROUP  BY  MOS 

ORDER  BY  COUNT(*)  DESC 

MOSTOTAL 

SELECT  SUM(MOS_COUNT)  AS  MOS_TOTAL 
FROM  NoAssnxMOS 

MOSTOP3 

SELECT  TOP  3  MOS_COUNT 
FROM  NoAssnxMOS 

MOSJNTEREST 

IF  MOS  TOP3/MOS  TOTAL  <  50% 

SELECT  MOS  INTO  MOSJNTEREST 
FROM  NoAssnxMOS 
ORDER  BY  MOS 
IF  MOS_TOP3/MOS_TOTAL  >  50% 

SELECT  TOP  3  MOS  INTO  MOS  JNTEREST 
FROM  NoAssnxMOS 
ORDER  BY  MOS 

AREACLOSUIC 
SELECT  UIC 
FROM  G17Natl 
WHERE  G17Natl.TIER  =  “5B” 

AND  G17Natl.UIC  =  ANY  (SELECT  AREAJJIC.UIC 

FROM  AREA_UIC) 


AREAZIPCODE(MapInfo) 
SELECT  ZIP_CODE  AS  ZIP 
FROM  US_ZIPS 
WHERE  Obj  Within  objAreaBuffer 
ORDER  BY  ZIP_CODE 
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G18(MapInfo) 

SELECT  UIC,  LEFT$(ZIP,5)  “ZIPCODE”,  PRI  “MOS” 

FROM  G18CWE 

WHERE  Obj  Within  objGl  8Buffer  AND  PRI  o 
ORDER  BY  UIC,  ZIP 
INTO  G18 

Area_G18_M0S 

SELECT  DISTINCT  UIC,  ZipCode,  MOS,  COUNT(UIC)  AS  UIC_TOTAL 
INTO  Area_Gl  8_M0S 
FROM  G18 

GROUP  BY  UIC,  ZipCode,  MOS 
ORDER  BY  UIC,  ZipCode,  MOS 

Area_G18_ZIP 

SELECT  DISTINCT  UIC,  ZIPCODE,  COUNT(UIC)  AS  UICTOTAL 

FROM  G18 

GROUP  BY  UIC,  ZIPCODE 

ORDER  BY  UIC,  ZIPCODE 

TOTALCLOSMOS 

SELECT  SUM(UIC_TOTAL)  AS  TOTAL_CLOS_MOS 
FROM  Area_G18_MOS 

WHERE  Area_Gl  8_MOS.MOS  =  ANY  (SELECT  MOS_INTEREST.MOS 

FROM  MOSINTEREST) 

AND  Area_Gl  8ZIP.UIC  =  ANY  (SELECT  AREA_CLOS_UIC.UIC 

FROM  AREA_CLOS_UIC) 

AND  Area_Gl  8ZIP.ZIPCODE  =  ANY  (SELECT  AREA_ZIPCODE.ZIP 

FROM  AREAZIPCODE) 


Yield  Curve:  The  shape  of  this  function  assumes  diminishing  returns  on  the  number  of  transfers 
available.  Experience  suggests  for  an  average  unit  of  100  people,  that  it  is 
unusual  to  expect  more  than  a  third  of  the  members  to  transfer  from  closing  units 
with  the  proper  MOS.  Of  the  reservists  in  this  category,  only  half  typically 
transfer,  so  a  value  of  60  personnel  is  assigned  a  utility  of  0.9.  The  incremental 
value  added  by  each  additional  person  over  60  continues  to  drop  until  no  marginal 
gain  is  expected  over  250. 
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Measure  20.  Available  MOS  IRR 


Definition:  This  measure  provides  the  number  of  Individual  Ready  Reserve  members  who 

live  in  the  area  of  the  proposed  relocation  site  and  who  possess  a  Military 
Occupational  Specialty  (MOS)  needed  by  the  relocating  unit.  These  people 
provide  a  preferred  pool  of  trained  recruits. 

Calculation:  The  number  of  IRR  members  who  possess  an  MOS  needed  by  the  moving  unit 
and  who  live  in  the  area  of  the  proposed  relocation  site  (based  upon  the  zip  code 
of  their  home  of  record  in  the  IRR  file)  are  counted.  If  the  three  MOS  groups  with 
the  largest  number  of  members  in  the  moving  unit  account  for  more  than  50 
percent  of  the  total  unit  membership,  then  only  those  three  MOSs  are  considered. 
Otherwise  all  MOSs  required  by  the  moving  unit  are  considered  as  an  MOS  of 
interest. 

TOT AL  IRR  MOS  [Total  Number  of  Available  Reservists  from  the  IRR  with  MOS’s  of 

Interest] 


Units:  Number  of  People 

Source  File:  IRR,  G18CWE 
ACROPOLIS  Table(s):  G18Natl, 


Query:  NoAssnxMOS 

SELECT  MOS,  COUNT(*)  AS  MOS_COUNT  INTO  NoAssnxMOS 
FROM  G18Natl 

WHERE  G1 8Natl.UIC  =  MovingUnit.UIC 

GROUP  BY  MOS 

ORDER  BY  COUNT(*)  DESC 

MOSTOTAL 

SELECT  SUM(MOS  COUNT)  AS  MOS  TOTAL 
FROM  NoAssnxMOS 

MOSJTOP3 

SELECT  TOP  3  MOS_COUNT 
FROM  NoAssnxMOS 

MOSINTEREST 

IF  MOS _TOP3/MOS_TOTAL  <  50% 

SELECT  MOS  INTO  MOS  INTEREST 
FROM  NoAssnxMOS 
ORDER  BY  MOS 
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IF  M0ST0P3/M0ST0TAL  >  50% 

SELECT  TOP  3  MOS  INTO  MOS  JNTEREST 
FROM  NoAssnxMOS 
ORDER  BY  MOS 

IRR(MapInfo) 

SELECT  ZIPC  “ZIP”,  LEFT$(PMOS,  3)  “MOS” 

FROM  IRR 

WHERE  Obj  Within  obj  AreaBuffer  and  ZIPC  o  “”  AND  PMOS  o  “” 

ORDER  BY  ZIPC 
INTO  IRR 

TOTALIRRMOS 

SELECT  SUM(UI C_T OTAL)  AS  TOTAL_CLOS_MOS 
FROM  IRR 

WHERE  IRR.MOS  =  ANY  (SELECT  MOSINTEREST.MOS 

FROM  MOSJNTEREST) 

Yield  Curve:  IRR  members  represent  preferred  recruits  for  less  than  half  of  the  positions  of  a 
typical  moving  unit  (approximately  40  out  of  100)  because  of  issues  such  as 
seniority  and  changes  in  the  skills  associated  with  an  MOS.  The  success  rate  of 
recruiting  IRR  members  is  approximately  1  out  of  100,  so  4000  IRR  members  in 
the  area  of  the  relocation  site  are  required  to  provide  sufficient  market  to  fill  the 
40  positions.  The  value  of  4000  is  assigned  the  average  utility  value  of  0.5.  As 
the  IRR  market  increases  it  exceeds  the  needs  of  the  moving  unit  and  makes  the 
limited  recruiting  efforts  marginally  less  effective. 


Max  Utility:  >25,000  Min  Utility:  0 
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APPENDIX  B.  ARIES  SOURCE  DATA  FILE  META  DATA 


This  appendix  contains  the  meta-data  that  could  be  documented  for  the  ARIES 
SDSS  project  source  files.  “ACROPOLIS”  as  used  in  this  appendix  refers  to  the  file 
name  of  the  ARIES  data  resource  file. 


Index 


1.  AMSA . 123 

2.  COMMAND  PLAN . 125 

3.  COMPLEX . 127 

4.  ECS . 129 

5.  FINANCE . 131 

6.  FPS . 133 

7.  FYxxLOSS . 135 

8.  G17 . 137 

9.  G18CWE . 139 

10.  G19TRUE . 141 

11.  GEOREF . 143 

12.  INTEREST . 145 

13.  IRR . 147 

14.  NGNON  CL . 149 

15.  QMA . 151 

16.  RPINFODT . 153 

17.  RZA . 155 
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ARIES  Data  File  Documentation  Form 


ARIES  File  Name:  AMSA _  Location:  ../Aries/MapBasic/USARCData 

File  Type:  FoxPro  2.6 _  Size(MB):  .026 _  No.  Records:  190 

Associated  ARIES  Tables:  Not  in  ACROPOLIS,  Geocoded  for  use  in  Maplnfo 


File  Description: 

AMSA  File  contains  information  about  the  location  of  each  AMSA  station.  It  is  used  in  determining  the 
value  for  the  distance  to  the  nearest  AMSA. 


Required  Data  Elements 


Name 

Description 

Data 

Type 

Format 

Key 

Field 

Facility  Identification  Code 

Char 

No 

facjitle 

Facility  Title 

Char 

No 

facstreet 

Street  Address  of  Facility 

Char 

No  ; 

faccity 

City  Facility  is  located  in 

Char 

No 

fac_state 

State  Facility  is  located  in 

Char 

No 

faczip 

Zip  Code  of  the  Facility 

Char 

No 

abb_type 

Char 

No 

Extract  Queries: 
NONE 
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ARIES  Data  File  Documentation  Form 


ARIES  File  Name:  COMMAND  PLAN _  Location:  ACROPOLIS _ 

File  Type:  FoxPro  2.6  Size(MB):  3.29  No.  Records:  9,897 

Associated  ARIES  Tables:  CMDPLAN 


File  Description: 

Command  Plan  is  the  file  that  contains  information  about  each  unit  in  the  Army  Reserve.  It  is  used  to 
cross  reference  FAC  ID's  with  UIC’s.  It  is  also  used  to  screen  for  Valid  UIC's  with  in  the  next  13  months. 


Required  Data  Elements 


Name 

Description 

Data 

Type 

Format 

Key 

Field 

UIC 

Unit  Identification  Code 

Char 

yes 

FACID 

Facility  Identification  Code 

Char 

no 

EDATE 

Effective  Date  of  Transaction 

Char 

no 

Extract  Queries: 

CMDPLAN 

SELECT  DISTINCT  UIC,  FACID  AS  FACID, 
EDATE 

FROM  COMMANDPLAN 
WHERE  (FACID  o  "N/A")  AND  (FACID  o 
"TBD")  AND  (FACID  o  "")  AND 
(LEN(FACID)  >  2)  AND 
((LEFT(EDATE,4)  =  T998'  AND 
MID(EDATE,5,2)  <=  '02')  OR 
(LEFT(EDATE,4)  <=  '1997')) 
ORDER  BY  UIC,  EDATE  DESC 
INTO  CMDPLAN 
INDEX  ON  UIC  as  UIC 
Note:  Application  automatically  adjusts  the 
dates  to  obtain  a  13  month  window. 
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ARIES  Data  File  Documentation  Form 


ARIES  File  Name:  COMPLEX _  Location:  ACROPOLIS _ 

File  Type:  FoxPro  2.6 _  Size(MB):  2.1 _  No.  Records:  1,557 

Associated  ARIES  Tables:  COMPLEX 


File  Description: 

The  Complex  File  is  used  to  determine  if  the  facility  is  owned  by  or  leased  to  the  government  and  the 
number  of  weekends  each  facility  is  used  during  a  month. 


Required  Data  Elements 


Name 

Description 

Data 

Type 

Format 

Key 

Field 

FACJD 

Facility  Identification  Code 

Char 

yes 

GOVTOWN 

Facility  ownership  status 

Char 

Y/N 

no 

RSWKNDPM 

Reserve  Station  weekend  usage  per  mo. 

Number 

6-4 

no 

Extract  Queries: 

COMPLEX_ 

SELECT  FACJD,  GOVT_OWN  AS 

FACOWNED,  RS_WKND_PM  AS 
FACWKNDUSED 
FROM  COMPLEX 
WHERE  LEN(FACJD)  =  5 
INTO  COMPLEX_ 

INDEX  ON  FACJD  as  FACID,  Primary,  Unique 
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ARIES  Data  File  Documentation  Form 


ARIES  File  Name: 


File  Type:  FoxPro  2.6 


Location:  , .  AMapBasic\USARCData\ 


Size(MB):  .004 


No.  Records:  30 


Associated  ARIES  Tables:  Not  in  ACROPOLIS,  Geocoded  for  use  in  Maplnfo _ 


File  Description: 

ECS  File  contains  information  about  the  location  of  each  Equipment  Center.  It  is  used  in  determining  the 
distance  to  the  nearest  ECS. 


Required  Data  Elements 


faczip 


abb_type 


Description 


Facility  Identification  Code 


Facility  Title 


Street  Address  of  Facility 


City  Facility  is  located  in 


State  Facility  is  located  in 


Zip  Code  of  the  Facility 


Data 

Type 


Char 


Char 


Char 


Char 


Char 


Char 


Key 

Format  Field 
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ARIES  Data  File  Documentation  Form 


ARIES  File  Name:  FINANCE _  Location:  ACROPOLIS _ 

File  Type:  FoxPro  2.6 _  Size(MB):  83.4  No.  Records:  311,793 

Associated  ARIES  Tables:  FINANCE,,  FINANCEQTR _ 


File  Description: 

Finance  is  the  file  that  contains  pay  information  for  the  previous  eight  quarters  about  every  Reservist.  It  is 
used  to  obtain  information  about  Drill  Attendance  for  a  given  Facility. 


Required  Data  Elements 


Name 

Description 

Data 

Type 

Format 

Key 

Field 

Current  Unit  Identification  Code 

Char 

No 

UTA1QCFY 

Unit  Training  Attendance  for  1st  Qtr  this  FY 

Number 

No 

UTA2QCFY 

Unit  Training  Attendance  for  2nd  Qtr  this  FY 

Number 

No 

UTA3QCFY 

Unit  Training  Attendance  for  3rd  Qtr  this  FY 

Number 

No 

UTA4QCFY 

Unit  Training  Attendance  for  4th  Qtr  this  FY 

Number 

No 

UTA1Q1PF 

Unit  Training  Attendance  for  1st  Qtr  last  FY 

Number 

No 

UTA2Q1PF 

Unit  Training  Attendance  for  2nd  Qtr  last  FY 

Number 

No 

UTA3Q1PF 

Unit  Training  Attendance  for  3rd  Qtr  last  FY 

Number 

No 

UTA4Q1PF 

Unit  Training  Attendance  for  4th  Qtr  last  FY 

Number 

No 

Extract  Queries: 

FINANCE_ 

SELECT  T  &  LEFT(CURR_UIC,5)  AS  UIC, 
COUNT (CURRUIC)  AS 
UICTOTAL 
FROM  FINANCE 
WHERE  CURR  UIC  o 
ORDER  BY  CURR  UIC 
GROUP  BY  CURR  UIC 
INTO  FINANCE^ 

INDEX  ON  UIC  as  UIC 


FINANCEQTR 

SELECT  "WM  &  LEFT(CURR_UIC,5)  AS  UIC, 

UTA1QCFY,  UTA2QCFY,  UTA3QCFY, 
UTA4QCFY,  UTA1Q1PF,  UTA2Q1PF, 
UTA3Q1PF,  UTA4Q1PF 
FROM  FINANCE 

WHERE  CURR  UIC  o  AND  NPS_IND  =  NULL 
AND  PAYSTAT  =  ’A' 

ORDER  BY  CURR  UIC 
INTO  FINANCEQTR 
INDEX  ON  UIC  as  UIC 
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ARIES  Data  File  Documentation  Form 


ARIES  File  Name:  FPS 


File  Type:  FoxPro  2.6 


Associated  ARIES  Tables:  FPS 


Location:  ACROPOLIS 


Size(MB):  .088 


No.  Records:  1,561 


File  Description: 

FPS  is  used  to  obtain  information  about  the  Cost  to  operate  each  facility  as  well  as  the  Condition  of  each 
Facility.  Used  to  return  a  value  for  the  Cost  per  Square  Foot  and  the  Facility  Condition. 


Required  Data  Elements 


Description 


Facility  Identification  Code 


Condition  of  the  Facility 


Cost  per  Square  Foot  to  Operate  Facility 


Key 

Format  Field 


Char 


Char 


Number 


Extract  Queries: 

FPS_ 

SELECT  FACID,  FAC_COND,  COST  PR  SF 

FROM  FPS 

WHERE  FAC  ID  o 

ORDER  BY  FAC  ID 

INTO  FPS_ 

INDEX  ON  FAC JD  as  FACID,  Primary,  Unique 
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ARIES  Data  File  Documentation  Form 


ARIES  File  Name:  FYxxLQSS _  Location:  ACROPOLIS _ 

File  Type:  FoxPro  2.6 _  Size(MB):  85.4  No.  Records:  260,000 

Associated  ARIES  Tables:  FYxxLQSS,  FYxxXFER _ 

File  Description: 

FYxxLOSS  file  contains  information  about  the  personnel  losses  incurred  by  each  unit  during  a  fiscal  year. 
It  is  used  to  determine  the  Average  Loss  and  Transfer  Rate  of  a  Unit. 


Required  Data  Elements 

i 


Name _ Description  Type  Format  Field 

UIC  Unit  Identification  Code _ Char _ No 

TRMN  Transfer  Reason  Code  Char  No 


Extract  Queries: 

FYxxLOSS 

FYxxXFER 

SELECT  UIC1  AS  UIC,  COUNT(UICl)  AS 

SELECT  UIC1  AS  UIC,  COUNT(UICl)  AS 

UIC  TOTAL 

UIC  TOTAL 

FROM  FY  LOSS 

FROM  FYLOSS 

WHERE  TRMN  =  'LOSS' 

WHERE  TRMN  =  TRFD' 

ORDER  BY  UIC1 

ORDER  BY  UIC1 

GROUP  BY  UIC1 

GROUP  BY  UIC1 

INTO  FYxxLOSS 

INTO  FYxxXFER 

INDEX  ON  UIC  as  UIC,  Primary,  Unique 

INDEX  ON  UIC  as  UIC,  Primary,  Unique 
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ARIES  Data  File  Documentation  Form 


ARIES  File  Name:  G17 


File  Type:  FoxPro  2.6 


Associated  ARIES  Tables:  G 1 7Natl 


Location:  ACROPOLIS 


Size(MB):  3.11 


No.  Records:  5,869 


File  Description: 

G17  file  contains  facility  Unitname,  street  address  data  and  Zip  Code.  It  is  used  as  the  primary  cross 
reference  with  Command  Plan  to  display  facility  information  and  validate  user  input. 


Required  Data  Elements 


Name 

Description 

UIC 

Unit  Identification  Code 

UNITNAME 

Name  of  the  Unit 

TCCCITY 

City  Unit  is  located  in 

TCCSTATE 

State  Unit  is  located  in 

TCCZIP 

Zip  code  of  the  Unit 

TIER 

Code  used  to  determine  if  Unit  is  closing 

RECSTAT 

Recruiting  Station  Code 

TYPEORG 

Type  of  organization 

Extract  Queries: 

G17Natl 

SELECT  UIC,  UNITNAME,  TCCCITY  AS 
CITY,  TCCSTAT  AS  STATE, 
LEFT(TCCZIP,5)  AS  ZIP,  TIER 
FROM  G17 

WHERE  (RECSTAT  o  "1")  AND  (TYPEORG 
o  "2”)  AND  UIC  o  "" 

ORDER  BY  UIC 
INTO  G17Natl 

INDEX  ON  UIC  as  UIC,  Primary,  Unique 


137 


THIS  PAGE  LEFT  INTENTIONALLY  BLANK 


138 


ARIES  Data  File  Documentation  Form 


ARIES  File  Name:  G18CWE  Location:  ACROPOLIS;. .\MapBasic\UsarcDat 

a 


File  Type:  FoxPro  2.6 _  Size(MB):  145.9  No.  Records:  208,416 


Associated  ARIES  Tables:  G18Natl,  G18NatHJIC;  also  Geocoded  for  use  in  Maplnfo 


File  Description: 

G1 8  File  contains  information  about  personnel  in  the  US  Army  Reserves.  It  is  used  in  determining  the 
Total  Number  Assigned  used  in  calculating  the  Loss/Transfer  Rates,  Total  Available  Closing  and  the 
Reassignments  values.  Also  used  to  obtain  a  list  of  the  Zip  Code’s  and  MOSs  of  every  Reservists  with  their 
associated  UIC.. 


Required  Data  Elements 


Name 

Description 

Data 

Type 

Format 

Key 

Field 

UIC 

Unit  Identification  Code  assigned 

Char 

No 

ZIP 

Zip  Code  of  the  individual 

Char 

No 

PRI 

Primary  MOS 

Char 

No 

Extract  Queries: 

G18Natl 

SELECT  UIC,  LEFT(ZIP,5)  AS  ZIPCODE, 
LEFT(PRI,3)  AS  MOS 
FROM  G 1 8_ 

WHERE  PRI  o  AND  UIC  o 
ORDER  BY  UIC 
INTO  G 1 8Natl 
INDEX  ON  UIC  as  UIC 


G18Natl_UIC 

SELECT  UIC,  COUNT(UIC)  AS  UIC_TOTAL 

FROM  G 1 8Natl 

ORDER  BY  UIC 

GROUP  BY  UIC 

INTO  G18Natl_UIC 

INDEX  ON  UIC  as  UIC,  Primary,  Unique 
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7 


ARIES  Data  File  Documentation  Form 


ARIES  File  Name:  G19TRUE _  Location:  ACROPOLIS _ 

File  Type:  FoxPro  2.6 _  Size(MB):  14.4  No.  Records:  233,211 

Associated  ARIES  Tables:  G19Natl _ _ 

File  Description: 

G19  File  contains  information  about  the  required  manning  levels  of  each  Unit.  It  is  used  in  determining 
Average  Area  Manning  for  a  Facility. 


Required  Data  Elements 


Data  Key 

Name  Description  Type  Format  Field 


Extract  Queries: 


G19Natl 

SELECT  OWNUIC  AS  UIC, 

COUNT(OWN_UIC)  AS 
UIC_TOTAL 
FROM  G19 
WHERE  OWN  UIC  o  "" 
ORDER  BY  OWN  UIC 
GROUP  BY  OWN  UIC 
INTO  G19Natl 
INDEX  ON  UIC  as  UIC 
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ARIES  Data  File  Documentation  Form 


ARIES  File  Name:  GEOREF _  Location:  ACROPOLI S ; . .\MapBasic\UsarcData 

File  Type:  FoxPro  2.6 _  Size(MB):  .21 _  No.  Records:  1,553 _ 

Associated  ARIES  Tables:  VALID JUNIT;  also  Geocoded  for  use  in  Maplnfo _ 


File  Description: 

Georef  File  contains  specific  information  about  each  Unit.  It  is  used  to  verify  and  cross  reference  FACID's 
and  UIC  as  well  as  Facility  and  Unit  specific  information. 


Required  Data  Elements 


Name 

Description 

Data 

Type 

Format 

Key 

Field 

FACJD 

Facility  Identification  Code 

Char 

No 

HI!  I  i 

Name  of  the  Facility 

Char 

No 

hbh 

City  the  Facility  is  located  in 

Char 

No 

State  the  Facility  is  located  in 

Char 

No 

WEE3BMMM I 

Zip  Code  of  the  Facility 

Char 

■SM 

Latitude 

Position  of  Facility  by  degree  of  latitude 

Number 

No 

Longitude 

Position  of  Facility  by  degree  of  longitude 

Number 

No 

Extract  Queries: 

VALID JJNIT 

SELECT  FACJD,  FAC  TITLE  AS 

UNITNAME,  FAC_CITY  AS  CITY, 
FAC  STATE  AS  STATE, 
LEFT(FAC_ZIP,5)  AS  ZIP 
FROM  GEOREF 
WHERE  FACJD  o 
ORDER  BY  FACJD 
INTO  VALID  UNIT 
INDEX  ON  FAC  ID  as  FACID 
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ARIES  Data  File  Documentation  Form 


ARIES  File  Name:  INTEREST 


File  Type:  FoxPro  2.6 


Associated  ARIES  Tables:  INTEREST 


Location:  ACROPOLIS 


Size(MB):  4.2 


No.  Records:  3,985 


File  Description: 

Interest  File  contains  information  about  facilities  and  the  date  they  were  acquired.  It  is  used  to  calculate 
the  Facility  Age  for  each  facility.. 


Required  Data  Elements 


Description 


Facility  Identification  Code 


Date  Facility  Acquired 


Key 

Format  Field 


Extract  Queries: 

INTEREST_ 

SELECT  FAC JDSTR  AS  FAC _ID,  DATE  ACQ 
FROM  INTEREST 

WHERE  FAC  IDSTR  o  ""  AND  ABB  TYPE  = 
"USARC  (MB)"  AND  NOT 
ISNULL(DATEACQ) 

ORDER  BY  FAC  JDSTR 
INTO  INTEREST^ 

INDEX  ON  FAC  ID  as  FACID,  Primary,  Unique 
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ARIES  Data  File  Documentation  Form 


ARIES  File  Name:  IRR _  Location:  . .  .\MapBasic\UsarcData _ 

File  Type:  FoxPro  2.6 _  Size(MB):  7.5 _  No.  Records:  140,077 

Associated  ARIES  Tables:  Not  in  ACROPOLIS,  Geocoded  for  use  in  Maplnfo _ 


File  Description: 

IRR  File  contains  information  about  the  individuals  listed  in  the  Individual  Ready  Reserve.  It  is  used  to 
determine  the  value  for  IRR  Available  and  Available  MOS  IRR. 


Required  Data  Elements 


Extract  Queries: 
NONE 
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ARIES  Data  File  Documentation  Form 
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ARIES  Data  File  Documentation  Form 


ARIES  File  Name:  QMA  Location:  . .  AMapBasic\UsarcData 

File  Type:  FoxPro  2.6 _  Size(MB):  2,8 _  No.  Records:  34,265 

Associated  ARIES  Tables:  Not  in  ACROPOLIS,  Geocoded  for  use  in  Maplnfo 


File  Description: 

QMA  File  contains  Census  information.  It  is  used  in  determining  the  value  for  Recruit  Market  for  each 
Facility. 


Required  Data  Elements 


Name 

Description 

Data 

Type 

Format 

Key 

Field 

ZIP 

Zip  Code 

Char 

No 

MWCAT12 

White  Male  Mental  Categories- 1  &2 

Number 

No 

MWCAT3A 

White  Male  Mental  Category  3A 

Number 

No 

MBCAT12 

Black  Male  Mental  Categories  1  &2 

Number 

No 

MBCAT3A 

Black  Male  Mental  Category  3A 

Number 

No 

MHCAT12 

Hispanic  Male  Mental  Categories  1  &2 

Number 

No 

MHCAT3A 

Hispanic  Male  Mental  Category  3A 

Number 

No 

Extract  Queries: 
NONE 
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ARIES  Data  File  Documentation  Form 


ARIES  File  Name:  RPINFODT _  Location:  ACROPOLIS _ 

File  Type:  FoxPro  2.6 _  Size(MB):  14.3 _  No.  Records:  47,159 

Associated  ARIES  Tables:  FPINFODT_ _ 

File  Description: 

RPINFODT  is  a  file  that  contains  information  about  the  backlogged  maintenance  costs  of  each  Facility.  It 
is  used  to  determine  the  amount  of  backlogged  maintenance  is  required  at  the  given  Facility. 


Required  Data  Elements 


Name 

Description 

Data 

Type 

Format 

Key 

Field 

FACJD 

Facility  Identification 

Char 

No 

Total  amount  of  outstanding  Maint.  Actions 

Number 

No 

Extract  Queries: 

RPINFODT_ 

SELECT  FACJD,  SUM(CWE_TOTAL)  AS 
MAINT_COST 
FROM  RPINFODT 
WHERE  FAC_ID  o  "" 

ORDER  BY  FACJD 
GROUP  BY  FACJD 
INTO  RPINFODT^ 

INDEX  ON  FAC  ID  as  FACID,  Primary,  Unique 
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ARIES  Data  File  Documentation  Form 


ARIES  File  Name:  RZA  Location:  . .  .\MapBasic\UsarcData 

File  Type:  FoxPro  2.6  Size(MB):  .16  No.  Records:  1,793 

Associated  ARIES  Tables:  Not  in  ACROPOLIS,  Geocoded  for  use  in  Maplnfo _ 


File  Description: 

RZA  File  contains  information  about  the  location  of  Recruit  Stations.  It  is  used  to  determine  the  distance 
to  the  nearest  Recruit  Station. 


Required  Data  Elements 


Npie 

Description 

Data 

Type 

Format 

Key 

Field 

rsid 

Recruit  Station  Identification  Code 

Char 

No 

name 

Recruit  Station  Title 

Char 

No 

zip 

Zip  Code  of  the  Recruit  Station 

Char 

No 

latitude 

Position  of  Recruit  Station  by  latitude 

Number 

No 

longitude 

Position  of  Recruit  Station  by  longitude 

Number 

No 

Extract  Queries: 
NONE 
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APPENDIX  C.  ARIES  DECISION  MEASURE  STATISTICS 


This  Appendix  contains  the  statistics  calculated  for  the  USARC  data  set.  Validity 
and  Frequency  Statistics  were  calculated  for  17  of  the  20  measures  that  were  not 
dependent  on  knowing  the  identification  of  the  Moving  Unit. 

Individual  percentages  in  the  frequency  distributions  for  the  17  measures  in  this 
appendix  are  percentages  relative  to  total  non-missing  values. 

The  limits  of  the  ranges  for  valid  values  were  determined  by  using  a  rule  of 
reasonableness  to  identify  values  that  would  adversely  affect  the  evaluation  process. 
Consideration  was  given  to  the  following  areas;  (1)  the  range  of  values  returned  during 
the  evaluation  process,  (2)  expected  values  based  on  the  Yield  Curves  and  (3)  common 
sense  (i.e.,  0  value). 


Index 

ARIES  Descriptive  Statistics . 159 

ARIES  Measures  Analysis . 161 

Measure  1.  Facility  Backlogged  Maintenance . 163 

Measure  2.  Facility  Operating  Costs . 165 

Measure  3 .  Facility  Age . 167 

Measure  4.  Facility  Condition . 169 

Measure  5.  Facility  Ownership . 171 

Measure  6.  Competition . 173 

Measure  7.  Average  Area  Drill  Attendance . 175 

Measure  8.  Area  Loss  Rate . 177 

Measure  9.  Area  Transfer  Rate . 179 

Measure  10.  Area  Average  Manning . 181 

Measure  1 1 .  Distance  to  Nearest  Recruit  Station . 183 

Measure  12.  Available  Transfers  from  Closing  Units . 185 

Measure  13.  IRR  Available . 187 

Measure  14.  Recruit  Market . 189 

Measure  16.  Distance  to  Nearest  Area  Maintenance  Support  Activity . 191 

Measure  17.  Distance  to  Nearest  Equipment  Concentration  Site . 193 

Measure  18.  Facility  Weekends  Used . 195 

Query  Time . 197 
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ARIES  Descriptive  Statistics 

1325  U.S.  Army  Reserve  Facilities 


Measure 

Observations 

(N) 

Min 

Value 

Max 

Value 

Mean 

Std 

Deviation 

1  Facility  Backlogged  Maint. 

0 

IHEBtHI 

448,131 

837,391 

2  Facility  Operating  Cost 

1,251 

0.0 

293.5 

3.0865 

9.7124 

3  Facility  Age 

765 

0 

1IM 

295.1 

173.3 

4  Facility  Condition 

1,251 

N/A 

N/A 

N/A 

N/A 

5  Facility  Owned 

1,319 

N/A 

N/A 

N/A 

N/A 

6  Competition 

18 

20,759 

4,116.3 

3,960.0 

7  Area  Drill  Attendance 

0.20 

0.82 

0.58 

0.06 

8  Area  Loss  Rate 

1,325 

0.00 

0.86 

0.32 

0.11 

9  Area  Transfer  Rate 

1,300 

0.00 

1.84 

0.27 

0.20 

10  Area  Average  Manning 

1,325 

0.00 

1.94 

0.86 

0.20 

1 1  Distance  to  Recruiter 

0 

7,619.9 

18.2 

287.7 

12  Area  Avail  Closing  Unit 

819 

1 

504 

75.2 

114.1 

13  IRR  Available 

1 

3,497 

395.8 

658.9 

14  Area  Recruit  Market 

1,316 

253 

33,189.9 

41,290.4 

15  ‘Reassignments 


16  Distance  to  AMSA 

0 

— 

42.4 

289.1 

17  Distance  to  ECS 

1,325 

0 

5,290.9 

268.1 

510.1 

1 8  Facility  Weekends  Used 

0 

3 

1.6 

1.0 

19  *Avail  MOS  Closing  Units 

20  ‘Available  MOS  IRR 


*  Moving  Unit  Specific  Measures 


(Minutes) 

Observations 

(N) 

Min 

Max 

Mean 

Std 

Deviation 

Time  to  Complete  Queries 

1325 

1.7 

76.1 

8.7 

CD 
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ARIES  Measures  Analysis 

1325  U.S.  Army  Reserve  Facilities 


Measure 

Missing 

Out  of 
Range 

Potentially 

Valid 

Valid 

Range 

1  Facility  Backlogged  Maint. 

9.1% 

1.7% 

89.2% 

0  <  xl  <  20M 

2  Facility  Operating  Cost 

5.6% 

18.4% 

76.0% 

3  Facility  Age 

42.3% 

0.2% 

57.6% 

x3  >  0 

4  Facility  Condition 

5.6% 

0.0% 

94.4% 

x4  =  G  or  A  or  R 

5  Facility  Owned 

0.5% 

0.0% 

99.5% 

x5  =  Y  or  N 

6  Competition 

1 .9% 

0.0% 

98.1% 

0<x6<  21,000 

7  Area  Drill  Attendance 

1.9% 

0.0% 

98.1% 

0  <=  x7  <  1 

8  Area  Loss  Rate 

0.0% 

2.1% 

97.9% 

9  Area  Transfer  Rate 

1.9% 

1.1% 

97.0% 

0  <  x9  <  1.0 

10  Area  Average  Manning 

0.0% 

2.3% 

97.7% 

0  <  xIO  <  1.5 

1 1  Distance  to  Recruiter 

0.0% 

0.2% 

99.8% 

| 

12  Area  Available  Closing  Unit 

38.2% 

0.0% 

61.8% 

x12  >=  0 

13  IRR  Available 

0.8% 

0.0% 

99.2% 

x13  >  0 

14  Area  Recruit  Market 

0.7% 

0.0% 

99.3% 

x14  >  0 

1 15  *Reassignments 


16  Distance  to  AMSA 


1 7  Distance  to  ECS 


18  Facility  Weekends  Used 


1 19  ‘Available  MOS  Closing  Units 


1 20  ‘Available  MOS  IRR 


*  Moving  Unit  Specific  Measures 


0.0% 


0.0% 


0.4% 


0.2% 


10.6% 


0.0% 


99.8% 


89.4% 


99.6% 


x16  >  500 


x17  >500 


x18  >  4 
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Measure  1.  Facility  Backlogged  Maintenance 


|  Frequency  Data  1 

Values 

(Millions) 

Frequency 

Percent 

Cumulative 

Frequency 

Cumulative 

Percent 

0 

23 

1.9 

23 

1.9 

>  0  -  .5 

850 

70.5 

873 

72.4 

>.5-1 

198 

16.4 

1071 

88.9 

>1-1.5 

80 

6.6 

1151 

95.5 

>1.5-2 

27 

2.2 

1178 

97.8 

>2-3 

17 

1.4 

1195 

99.2 

>3-4 

2 

0.2 

1197 

99.3 

>4-5 

0 

0.0 

1197 

99.3 

>5-10 

5 

0.4 

1202 

99.8 

>10-15 

3 

0.2 

1205 

100.0 

>15-20 

0 

0.0 

1205 

100.0 

>20 

0 

0.0 

1205 

100.0 

Total  Non- 
Missing 

1205 

Missing 

120 

Total 

1325 

Descriptive  Statistics 

N 

Min 

Max 

Mean 

Std  Dev 

1205 

0 

11,979,371 

448,130.8 

837,390.9 

Max  Utility:  0 

Min  Utility:  1,000,000 

Databases:  RPINFODT 
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Measure  2.  Facility  Operating  Costs 


Frequency  Data  || 

Values 

(Millions) 

Frequency 

Percent 

Cumulative 

Frequency 

Cumulative 

Percent 

0 

242 

19.3 

242 

19.3 

>0-2 

465 

37.2 

707 

56.5 

>2-4 

305 

24.4 

1012 

80.9 

>4-6 

97 

7.8 

1109 

88.6 

>6-8 

39 

3.1 

1148 

91.8 

>8-10 

44 

3.5 

1192 

95.3 

>10-20 

41 

3.3 

1233 

98.6 

>20-50 

16 

1.3 

1249 

99.8 

>  50  -100 

0 

0.0 

1249 

99.8 

>  100-200 

1 

0.1 

1250 

99.9 

>  200 

1 

0.1 

1251 

100.0 

Total  Non-Missing 

1251 

Missing 

74 

Total 

1325 

Descriptive  Statistics 

N 

Min 

Max 

Mean 

Std  Dev 

1251 

0 

294 

3.1 

9.7 

Max  Utility:  0 
Min  Utility:  100 
Databases:  FPS 
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Measure  3.  Facility  Age 


Frequency  Data  |j 

Values 

(Months) 

Frequency 

Percent 

Cumulative 

Frequency 

Cumulative 

Percent 

<0 

0 

0.0 

0 

0.0 

0 

2 

0.3 

2 

0.3 

0-100 

81 

10.6 

83 

10.8 

101  -  200 

33 

CO 

116 

15.2 

201  -  300 

361 

47.2 

477 

62.4 

301  -  400 

29 

3.8 

506 

66.1 

401  -  500 

185 

24.2 

691 

90.3 

501  -  750 

70 

9.2 

761 

99.5 

751  - 1000 

1 

0.1 

762 

99.6 

1001  -1500 

2 

0.3 

764 

99.9 

—  u.imi.1.1.^ 

1 

0.1 

765 

100.0 

0 

0.0 

765 

100.0 

Total  Non-Missing 

765 

Missing 

560 

Total 

1325 

Descriptive  Statistics 

N 

Min 

Max 

Mean 

Std  Dev 

765 

0 

1,677 

295.1 

173.3 

Max  Utility:  0 

Min  Utility:  1,200 

Databases:  INTEREST 
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Measure  4.  Facility  Condition 


j  Frequency  Data  jj 

Values 

Frequency 

Percent 

Cumulative 

Frequency 

Cumulative 

Percent 

GREEN 

1251 

100.0 

1251 

106.3 

AMBER 

0 

0.0 

1251 

106.3 

RED 

0 

0.0 

1251 

106.3 

Total  Non-Missing 

1251 

Missing 

74 

Total 

1251 

Green  Utility: 

1.0 

Amber  Utility: 

0.5 

Red  Utility: 

0.0 

Databases: 

FPS 
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Measure  5.  Facility  Ownership 


|  Frequency  Data  1 

Values 

Frequency 

Percent 

Cumulative 

Frequency 

Cumulative 

Percent 

Y 

1110 

83.8 

1110 

84.2 

N 

209 

15.8 

1319 

100.0 

Total  Non-Missing 

1319 

Missing 

6 

Total 

1325 

Max  Utility:  YES 

Min  Utility:  NO 

Databases:  COMPLEX 
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Measure  6.  Competition 


|  Freq 

uency  Data  |j 

Values 

(People) 

Frequency 

Percent 

Cumulative 

Frequency 

Cumulative 

Percent 

0 

0 

0.0 

0 

0.0 

309 

23.8 

309 

23.8 

210 

16.2 

519 

39.9 

| 

166 

12.8 

685 

52.7 

129 

9.9 

814 

62.6 

1 

52 

4.0 

866 

66.6 

5,001  -  7,500 

221 

17.0 

1087 

83.6 

| 

80 

6.2 

1167 

89.8 

99 

7.6 

1266 

97.4 

31 

2.4 

1297 

99.8 

3 

0.2 

1300 

100.0 

Total  Non-Missing 

1300 

Missing 

25 

Total 

1325 

i 

Descriptive  Statistics 

N 

Min 

Max 

Mean 

Std  Dev 

1300 

18 

20,759 

4,116.3 

3,960.0 

Max  Utility:  0 

Min  Utility:  10,000 

Databases:  COMMAND  PLAN,  G17,  G19TRUE,  GEOREF,  NGNON_CL 
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Measure  7.  Average  Area  Drill  Attendance 


0.01  - 

0.10 

0.11  - 

0.20 

0.21  - 

0.30 

0.31  - i 

0.40 

0.41  - 1 

0.50 

0.51  - 1 

0.60 

0.61  -  ( 

3.70 

0.71  -  ( 

3.80 

0.81  -  ( 

).90 

0.91  -  ( 

>.99 

>=  1.0 

Total  Non-Missing 

Missing 

Total 

N 


1300 


Frequency 


0 


0 


0 


703 


472 


32 


1 


0 


0 


1300 


25 


1325 


Percent 


0.0 


0.0 


0.0 


0.3 


1.1 


5.7 


54.1 


36.3 


2.5 


0.1 


0.0 


0.0 


Descriptive  Statistics 


Cumulative 
Frequency 


0 


18 


92 


795 


1267 


1299 


1300 


1300 


1300 


Mean 


0.58 


Cumulative 

Percent 


0.0 


0.0 


0.0 


0.3 


1.4 


7.1 


61.2 


97.5 


99.9 


100.0 


100.0 


100.0 


Std  Dev 
0.06 


Max  Utility:  1 
Min  Utility:  0 

Databases:  COMMAND  PLAN,  FINANCE,  G17,  G19TRUE,  GEOREF 
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Measure  8.  Area  Loss  Rate 


|  Frequency  Data  || 

Values 

(Percent) 

Frequency 

Percent 

Cumulative 

Frequency 

Cumulative 

Percent 

0 

28 

2.1 

28 

2.1 

0.01-0.10 

11 

0.8 

39 

2.9 

0.11  -0.20 

44 

3.3 

83 

6.3 

0.21  -  0.30 

494 

37.3 

577 

43.5 

0.31  -  0.40 

555 

41.9 

1132 

85.4 

0.41  -  0.50 

144 

1276 

96.3 

0.51  -  0.60 

21 

1.6 

1297 

97.9 

0.61  -  0.70 

17 

1.3 

1314 

99.2 

0.71  -  0.80 

7 

0.5 

1321 

99.7 

0.81  -  0.90 

4 

0.3 

1325 

100.0 

0.91  -  0.99 

0 

0.0 

1325 

100.0 

>=  1.0 

0 

0.0 

1325 

100.0 

Total  Non-Missing 

1325 

Missing 

0 

Total 

1325 

Descriptive  Statistics 

N 

Min 

Max 

Mean 

Std  Dev 

1325 

0.00 

0.86 

0.32 

0.11 

Max  Utility:  0 
Min  Utility:  1 

Databases:  COMMAND  PLAN,  FYxxLOSS,  G17,  G18CWE,  GEOREF 
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Measured.  Area  Transfer  Rate 


Frequency  Data 

Values 

(Percent) 

Frequency 

Percent 

Cumulative 

Frequency 

Cumulative 

Percent 

0 

7 

0.5 

7 

0.5 

0.01  -0.10 

193 

14.8 

200 

15.4 

0.11-0.20 

379 

29.2 

579 

44.5 

0.21  -  0.30 

279 

21.5 

858 

66.0 

0.31  -  0.40 

224 

17.2 

1082 

83.2 

0.41  -  0.50 

71 

5.5 

1153 

88.7 

0.51  -  0.60 

44 

3.4 

1197 

92.1 

0.61  -  0.70 

34 

2.6 

1231 

94.7 

0.71  -  0.80 

15 

1.2 

1246 

95.8 

0.81  -  0.90 

42 

3.2 

1288 

99.1 

0.91  -  0.99 

4 

0.3 

1292 

99.4 

>=  1.0 

8 

0.6 

1300 

100.0 

Total  Non-Missing 

1300 

Missing 

25 

Total 

1325 

Descriptive  Statistics 

N 

Min 

Max 

Mean 

Std  Dev 

1300 

0.00 

1.84 

0.27 

0.20 

Max  Utility:  0 
Min  Utility:  >=  1 

Databases:  COMMAND  PLAN,  FYxxLOSS,  G17,  G18CWE,  GEOREF 
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Measure  10.  Area  Average  Manning 


Frequency  Data 

Values 

(Percent) 

Frequency 

Percent 

Cumulative 

Frequency 

Cumulative 

Percent 

0 

25 

1.9 

25 

1.9 

0.01  -  0.50 

22 

1.7 

47 

3.5 

0.51  -  0.75 

183 

13.8 

230 

17.4 

0.76  -  0.80 

124 

CD 

4^ 

354 

26.7 

0.81  -  0.90 

355 

26.8 

709 

53.5 

o 

to 

I 

© 

o 

455 

34.3 

1164 

87.8 

1.01  -1.20 

137 

10.3 

1301 

98.2 

1.21  -1.40 

18 

1.4 

1319 

99.5 

1.41  -1.60 

0 

0.0 

1319 

99.5 

1.61  -1.80 

5 

0.4 

1324 

99.9 

1.81  -1.90 

0 

0.0 

1324 

99.9 

1.91-2.0 

1 

0.1 

1325 

100.0 

>  2.0 

0 

0.0 

1325 

100.0 

Total  Non-Missing 

1325 

.  '• '  . 

■  ‘  ‘  ■ 

Missing 

0 

Total 

1325 

Descriptive  Statistics 

N 

Min 

Max 

Mean 

Std  Dev 

1325 

0.00 

1.94 

0.86 

0.20 

Max  Utility:  1.25 

Min  Utility:  0 

Databases:  COMMAND  PLAN,G17,  G18CWE,  G19TRUE,  GEOREF 
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Measure  1 1 .  Distance  to  Nearest  Recruit  Station 


|  Frequency  Data  j 

Values 

(Miles) 

Frequency 

Percent 

Cumulative 

Frequency 

Cumulative 
-  Percent 

0 

2 

0.2 

2 

0.2 

>0-10 

1098 

82.9 

1100 

83.0 

>10-20 

110 

8.3 

1210 

91.3 

>20-30 

76 

5.7 

1286 

97.1 

>30-40 

25 

1.9 

1311 

98.9 

>40-50 

7 

0.5 

1318 

99.5 

>50-75 

3 

0.2 

1321 

99.7 

>  75  - 100 

2 

0.2 

1323 

99.8 

>100-200 

0 

0.0 

1323 

99.8 

>  200  -  300 

0 

0.0 

1323 

99.8 

>300 

2 

0.2 

1325 

100.0 

Total  Non-Missing 

1325 

Missing 

0 

Total 

1325 

Descriptive  Statistics 

N 

Min 

Max 

Mean 

Std  Dev 

1325 

0.0 

7,619.9 

18.2 

287.7 

Max  Utility:  0 
Min  Utility:  >=100 
Databases:  RZA 
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Measure  12.  Available  Transfers  from  Closing  Units 


Freq 

uency  Data  j{ 

Value 

(People) 

Frequency 

Percent 

Cumulative 

Frequency 

Cumulative 

Percent 

0 

0 

0.0 

0 

0.0 

1  -50 

509 

62.1 

509 

62.1 

51  -100 

143 

17.5 

652 

79.6 

101  -150 

44 

5.4 

696 

85.0 

151  -200 

40 

4.9 

736 

89.9 

201  -  250 

7 

0.9 

743 

90.7 

251  -  300 

12 

1.5 

755 

92.2 

301  -  350 

15 

1.8 

770 

94.0 

351  -  400 

7 

0.9 

777 

94.9 

401  -  500 

41 

5.0 

818 

99.9 

>500 

1 

0.1 

819 

100.0 

Total  Non-Missing 

819 

Missing 

506 

Total 

1325 

Descriptive  Statistics 

N 

Min 

Max 

Mean 

Std  Dev 

819 

1 

504 

75.2 

114.1 

Max  Utility:  250 
Min  Utility:  0 

Databases:  COMMAND  PLAN,  G17,  G18CWE,  GEOREF 
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Measure  13.  IRR  Available 


Descriptive  Statistics 

N 

Min 

Max 

Mean 

Std  Dev 

1315 

1 

3,497 

395.8 

658.9 

Max  Utility:  >=10,000 
Min  Utility:  0 
Databases:  IRR 
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Measure  14.  Recruit  Market 


Descriptive  Statistics 

N 

Min 

Max 

Mean 

Std  Dev 

:  1316 

253 

214,738 

33,189.9 

41,290.4 

Max  Utility:  >=  250,000 

Min  Utility:  0 

Databases:  QMA 
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Measure  16.  Distance  to  Nearest  Area  Maintenance  Support  Activity 


Descriptive  Statistics 

N 

Min 

Max 

Mean 

Std  Dev 

1325 

0.0 

7,619.9 

42.4 

289.1 

Max  Utility:  0 

Min  Utility:  >=  500 

Databases:  AMSA 
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Measure  17.  Distance  to  Nearest  Equipment  Concentration  Site 


Frequency  Data 


Values 

(Miles) 


>0-10 


>10-20 


>20-30 


>30-40 


>40-50 


>50-75 


>  75  - 100 


> 100-200 


>  200  -  300 


>  300 


Total  Non-Missing 


Missing 


Total 


Frequency 


84 


43 


38 


45 


54 


41 


129 


151 


401 


178 


161 


1325 


0 


1325 


Percent 


6.3 


3.2 


2.9 


Cumulative 

Frequency 


84 


127 


165 


210 


264 


305 


434 


585 


986 


1164 


1325 


Cumulative 

Percent 


6.3 


9.6 


12.5 


15.8 


19.9 


23.0 


32.8 


44.2 


74.4 


87.8 


100.0 


Descriptive  Statistics 

N 

Min 

Max 

Mean 

Std  Dev 

1325 

0.0 

5,290.9 

268.1 

510.1 
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Measure  18.  Facility  Weekends  Used 


|  Frequency  Data 

Values 

Frequency 

Percent 

Cumulative 

Frequency 

Cumulative 

Percent 

0 

237 

18.0 

237 

18.0  j 

1 

430 

32.6 

667 

50.5 

2 

331 

25.1 

998 

75.6 

3 

322 

24.4 

1320 

100.0 

4 

0 

0.0 

1320 

100.0  1 

Total  Non-Missing 

1320 

Missing 

5 

Total 

1325 

Descriptive  Statistics 

N 

Min 

Max 

Mean 

Std  Dev 

1320 

0 

3 

1.6 

1.0 

Max  Utility:  0 

Min  Utility:  4 


Databases:  COMPLEX 
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Query  Time 


Frequency  Data 


Values 

(Minutes) 


>0-2 


-6 


>6-8 


>8-10 


>10-12 


>12-15 


>15-20 


>20-30 


>30 


Total 


Frequency 


193 


343 


239 


175 


129 


117 


50 


44 


28 


1325 


Percent 

Cumulative 

Frequency 

Cumulative 

Percent 

0.5 

7 

0.5 

14.6 

200 

15.1 

25.9 

543 

41.0 

18.0 

782 

59.0 

13.2 

957 

72.2 

9.7 

1086 

82.0 

8.8 

1203 

90.8 

3.8 

1253 

94.6 

3.3 

1297 

97.9 

2.1 

1325 

100.0 

N 


1325 


Descriptive  Statistics 


Mean 


8.7 


Std  Dev 
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APPENDIX  D.  SOURCE  FILE  DOCUMENTATION  FORMS 

This  appendix  contains  the  recommended  forms  to  be  used  in  gathering  meta-data 
information  for  data  files  used  in  conjunction  with  developing  a  SDSS  application. 
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Source  File  Documentation  Form 


Customer  Path  &  File  Name: 

LAN  Server: 

Point  of  Contact(Office  &  Phone): 

File  Type: 

Size(MB):  No.  Records: 

Source  File  Name: 

Update  Frequency: 

Source  File  Location: 

POC:  Phone: 

Application  File  Name: 

File  Description: 

Queries  Used: 


Page  1  of 2 
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Source  File  Documentation  Form 


Source  File  Name:  _  Location: 

File  Type:  _ _  Size(MB):  No.  Records: 

Associated  Tables: 

File  Description:  ™  — 


Required  Data  Elements 

Data  Key 

Name  Description  Type  Format  Field 
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